Friday, October 26, 2012

Neo4j 1.9.M01 - Self-managed HA

Welcome everyone to the first Milestone of the Neo4j 1.9 releases! In this release we're introducing a simplification of HA cluster operations, and a set of excellent improvements to our query language, Cypher.

Neo4j HA - fewer moving parts

Setting up a Neo4j High Availability cluster requires configuring a companion coordinator cluster. The coordinator cluster is used for master election, tracking the most recent transaction ID, and discovering the current master.
With Neo4j 1.9 M01, cluster members communicate directly with each other,  based on an implementation of the Paxos consensus protocol for master election. We expect that, when perfected, this simplified approach will provide the same behavior you rely upon now in production, with easier operation.

To find out more about the new setup, look to our docs for an updated overview, operational explanation and a setup tutorial.

DISCLAIMER: This is the first outing for the new HA setup and is intended for evaluation and feedback purposes only. Please do not rely on it in production until Neo4j 1.9.GA.


I made a small setup video, using a simple script for a 3-member cluster running on one machine.
Setting up a local HA cluster in Neo4j 1.9 from Peter Neubauer on Vimeo.


Feedback is, as always, warmly welcomed on the Neo4j Google Group!

CYPHER

New Pattern Matcher

In this release we've added a new pattern matcher in Cypher, that utilizes the bi-directional traversal framework that was introduced in Neo4j 1.8 resulting in significantly improved performance over complex graphs. Cypher now also supports setting properties directly from parameter maps providing a useful syntactic shortcut and more pleasant, readable code. 

WITH SKIP/LIMIT/ORDER BY

In the same way that limiting returned results let's you focus on the data matters, Cypher can now apply functions in the WITH clause to cut the number of processed results, resulting in less graph to process and faster queries. See:


START n=node(3)
MATCH n--m
WITH m
ORDER BY m.name desc
LIMIT 1
MATCH m--o
RETURN o.name





Community contributions to Cypher

Being able to aggregate a number of results into a single value has been a popular feature request in Cypher. To achieve that, we've added a reduce (or fold) operation that allows you to aggregate data as we can see here:

start n=node(...)

return reduce( total=0, x in collect(n) : total + x.fun) as total_fun


Thank you Wes Freeman!

For this release, a huge thanks goes to community contributor Wes Freeman for tirelessly adding new features into the Cypher language.

He contributed the above reduce() function, then also string functions, and fixed the inconsistent head/tail behavior. Wes also updated the Neo4j console (see above) with syntax highlighting and multiline input. It now also works on IE9 too!


Wizard as you are Wes, we hope you enjoy the Kymera Wand remote control from Michael Hunger. Your Neo4j community T-Shirt is on its way!

The Release docs

As always, all changes are contained in the changelog, all Cypher changes can be found here


OK, enough talk. Go get Neo4j 1.9.M01 and let us know what you think. Again, please do not use this release in production. This is exclusively for early access to HA with simplified ops.

Enjoy,

/peter


Thursday, October 25, 2012

Zen Graph Visualizations - What can you do?

As you probably know by now, we plan to run GraphConnect on Nov 6 2012 in San Francisco.
For the event, the Neo4j community team devised some interesting hacking challenges.
One challenge is recording an interaction-graph based on OpenBeacon-RFID tracking much like the one we worked on during JRubyConf.EU
The other one has a backstory:
Andreas and Michael were chatting about graph datasets, their representation and especially conference data related to GraphConnect. Andreas suggested to rake the visualizations in sand much like you'd do in a Zen-Garden. So Michael started to search the web for javascript libraries that would be able to achieve such an effect but came across something much more incredible. The kickstarter-backed Zen-Table project by Simon Hallam is a autonomous Zen garden in a box (on a table).
Zen-Table is an ingenious mechanism consisting of a microcontroller, strong neodynium magnets and a plotter like mechanism to move the "drawing head" across sand on top of a coffee- or desktop-table. Of course the table is programmable. It allows for static images from a SD-card to be drawn in the sand and also dynamic renderings via an USB-port.

The drawing is controlled with programming language called table-script which has some basic commands like: Robot.LineTo(x,y,final-speed) or Robot.LineToSmooth(x,y), Robot.Clear[X|Y](), Robot.Home() and some more. Simon created a javascript library to generate that table-script on demand as well as an image tracer. You can find all this in action at zen-table.com.


First ever Zen-Desktop table shipped, unboxed and tested

During our crazy meetup week in the Bay Area we arranged for the first Zen-Desktop-Table to be shipped to our place. We couldn't await to unbox it and try it out. After spending a great day in the awesome premises of the HeavyBit office - Herokus Add-On-Provider incubator. - thanks James & Jason - the unboxing was a worthy highlight of the evening party there.

As we love to work with Heroku and their incredible employees, we sponaneously decided to give the Zen-Table into the caring hands of our Heroku friends as a small token of friendship. And obviously they liked it.

Live at GraphConnect: a Challenge for You


So our desktop Zen-Table will be there at GraphConnect, waiting for you in our Community Café. We're super excited about having this opportunity to hack with it.
We thought about combining the Zen with a Raspberry Pi to allow external access via http and running a JavaScript engine on the Pi to generate the necessary table-script on demand.
Your challenge is now to come up with graph renderings that are well suited to visualize a small or medium size dataset on the sand of the Zen-Table. You can try your ideas at zen-table.com with some static graph data.
We look forward to your suggestions and will invite the two best ideas to come along to GraphConnect.

Sunday, October 21, 2012

REST::Neo4p - A Perl "OGM"

This is a guest post by Mark A. Jensen, a DC area bioinformatics scientist. Thanks a lot Mark for writing the impressive Neo4j Perl library and taking the time to documenting it thoroughly.
You might call REST::Neo4p an "Object-Graph Mapping". It uses the Neo4j REST API at its foundation to interact with Neo4j, but what makes REST::Neo4p "Perly" is the object oriented approach. Creating node, relationship, or index objects is equivalent to looking them up in the graph database and, for nodes and relationships, creating them if they are not present. Updating the objects by setting properties or relationships results in the same actions in the database, and returns errors when these actions are proscribed by Neo4j. At the same time, object creation attempts to be as lazy as possible, so that only the portion of the database you are working with is represented in memory.
The idea is that working with the database is accomplished by using Perl5 objects in a Perl person's favorite way. Despite the modules' "REST" namespace, the developer should almost never need to deal with the actual REST calls or the building of URLs herself. The design uses the amazingly complete and consistent self-describing information in the Neo4j REST API responses to keep URLs under the hood.

The Backstory

I am a bioinformatics scientist contracted to a large US government research initiative. Our clients were interested in alternatives to RDBMS to represent the big data we manage. As a long-time object-oriented Perler with only a smattering of Java, I wanted to investigate Neo4j, but on my own terms. My CPAN and other searches came up with some experimental approaches to working with the Neo4j REST service, but there was little true object functionality and robustness in place.
I was surprised to see so little Neo4j activity in the open-source Perl domain, in the face of many pleading requests in the Neo4j community for a decent Perl interface. So I took on the challenge, and I am definitely hoping for positive feedback and constructive criticism.

The Basics

Download and install

REST::Neo4p is really a family of Perl modules/classes. Get it and install it like this:
 $ cpan
 cpan> install REST::Neo4p
That's it. The cpan utility comes with every Perl installation. If you haven't used it before, you will be asked some setup questions, for most of which the defaults will be fine.
Tests: The installation process will run a complete suite of tests. To take advantage of these, enter the full API URL and port of your (running!) neo4j engine when prompted. (It defaults to http://127.0.0.1:7474.)
If any tests fail, please report a bug right here.
The cpan utility will also help you install the dependencies, other CPAN modules that make REST::Neo4p go. There are only a few. A trick to get them all to install automatically is the following:
 $ cpan
 cpan> o conf prerequisites_policy follow
 cpan> install REST::Neo4p

Connect and manipulate

I'm going to assume you're familiar with Perl 5 objects. If you're not, check out
straight from the camel's mouth.
I'll walk through a simple model from the bioinformatics domain, nucleotides in DNA (follow link for a nice introduction).
Nucleotides are the letters which spell the words (genes) encoded in DNA. There are four that are most important, and are referred to as A, C, G, and T. These letters stand for their chemical names. DNA can change or mutate when one of these letters is changed to another. The letters are the nodes in our model.
Mutations are changes in the DNA from one letter to another. These changes themselves are classified, and are called either transistions or transversions. The details don't matter here, except that mutations from one letter to another are the relationships in our model.
First, include the REST::Neo4p modules, then connect to the database:
#-*- perl -*-
use REST::Neo4p;
use strict;
use warnings;

eval {
    REST::Neo4p->connect('http://127.0.0.1:7474');
};
ref $@ ? $@->rethrow : die $@ if $@;

Errors, including communication errors, are transmitted as exception objects from a hierarchy (see REST::Neo4p::Exceptions for a full description). The last line here just checks whether an exception was thrown at connect time and, if so, dies with a message. More sophisticated handling is possible and encouraged.
Now to create nodes along with indexes to easily handle them. The new constructor does the creation and returns the objects for the Neo4p entity classes Index, Node, and Relationship.
my @node_defs = 
    (
     { name => 'A', type => 'purine' },
     { name => 'C', type => 'pyrimidine' },
     { name => 'G', type => 'purine'},
     { name => 'T', type => 'pyrimidine' }
    );
my $nt_types = REST::Neo4p::Index->new('node','nt_types');
my $nt_names = REST::Neo4p::Index->new('node','nt_names');
my @nts = my ($A,$C,$G,$T) = map { REST::Neo4p::Node->new($_) } @node_defs;

$nt_names->add_entry($A, 'fullname' => 'adenine');
$nt_names->add_entry($C, 'fullname' => 'cytosine');
$nt_names->add_entry($G, 'fullname' => 'guanosine');
$nt_names->add_entry($T, 'fullname' => 'thymidine');

for ($A,$G) {
    $nt_types->add_entry($_, 'type' => 'purine');
}

for ($C,$T) {
    $nt_types->add_entry($_, 'type' => 'pyrimidine');
}

In general, you provide a hash reference that maps properties to values to the Node and Relationship constructors. To create an Index, the first argument is the index type ('node' or 'relationship'), followed by the index name. Use the add_entry method to add an object to an index with a tag => value pair.
On to relationships. We create a relationship index to corral the mutation types, and express the mutation types as relationship objects. Using the relate_to method from node objects, we create relationships between pairs of nodes with a pretty natural syntax:
my $nt_mutation_types = REST::Neo4p::Index->new('relationship','nt_mutation_types');

my @all_pairs;
my @a = @nts;
while (@a) {
    my $s = shift @a;
    push @all_pairs, [$s, $_] for @a;
}

for my $pair ( @all_pairs ) {
    if ( $pair->[0]->get_property('type') eq 
  $pair->[1]->get_property('type') ) {
 $nt_mutation_types->add_entry(
     $pair->[0]->relate_to($pair->[1],'transition'),
     'mut_type' => 'transition'
     );
 $nt_mutation_types->add_entry(
     $pair->[1]->relate_to($pair->[0],'transition'),
     'mut_type' => 'transition'
     );
    }
    else {
 $nt_mutation_types->add_entry(
     $pair->[0]->relate_to($pair->[1],'transversion'),
     'mut_type' => 'transversion'
     );
 $nt_mutation_types->add_entry(
     $pair->[1]->relate_to($pair->[0],'transversion'),
     'mut_type' => 'transversion'
     );
    }
}

The relate_to method returns the relationship object that is created. Here we use that side effect directly in the add_entry method of the index.
If you prefer, you can use a relationship constructor:
 $transition = REST::Neo4p::Relationship->new($A, $G, 'transition');
Relationship properties can be added in the constructor, or after the fact:
 $transition->set_property('involved_types' => 'purines');

Perl garbage collection removes objects from memory, but does not delete from the database. You must do this explicitly, using the the remove method on any entity:
 for my $reln ($A->get_relationships) {
   $reln->remove;
 }
 $A->remove;
 $nt_types->remove;
 # etc.

Retrieve and query

The REST::Neo4p module itself contains a few methods for retrieving database items directly. The most useful is probably get_index_by_name. Index objects have find_entries for retrieving the items in the index.
use REST::Neo4p;
use strict;
use warnings;

REST::Neo4p->connect('http://127.0.0.1:7474');
my $idx = REST::Neo4p->get_index_by_name('nt_names','node');
my ($node) = $idx->find_entries(fullname => 'adenine');
my @nodes = $idx->find_entries('fullname:*');

Note that find_entries always returns an array, and it supports either an exact search or a lucene search.
REST::Neo4p also supports the CYPHER query REST API. Entities are returned as REST::Neo4p objects. Query results are always sent to disk, and results are streamed via an iterator that is meant to imitate the commonly used Perl database interface, DBI.
Here we print a table of nucleotide pairs that are involved in transversions:
my $query = REST::Neo4p::Query->new(
  'START r=relationship:nt_mutation_types(mut_type = "transversion")
   MATCH a-[r]->b
   RETURN a,b'
  );
$query->execute;
while (my $result = $query->fetch) {
   print $result->[0]->get_property('name'),'->',
         $result->[1]->get_property('name'),"\n";
}

The query is created with the CYPHER code, then executed. The fetch iterator retrieves the returned rows (as array references) one at a time until the result is exhausted. Again, the result is not held in memory, so queries returning many rows should not present a big problem.

Production-quality Features

My goal for REST::Neo4p is to go beyond the Perl experiments with Neo4j that are out there to create modules that are robust enough for production use (yes, people DO use Perl in production!). This meant a couple of things:
  • Be robust and feature-rich enough that people will want to try it.
  • Be responsive enough to bugs that people will see it maintained.
  • Incorporate unit and integration tests into the user installation.
  • Have complete documentation with tested examples.
  • Create bindings to as many of the Neo4j REST API functions as is possible for a guy with a real job.
  • Be concerned with performance by being sensitive to memory use, and taking advantage of streaming and batch capabilities.
  • Capture both Perl and Neo4j errors in a catchable way.

There isn't space here to discuss these points in detail, but here are some highlights and links:
  • REST::Neo4p::Agent is the class where the REST calls get done and the endpoints are captured. It subclasses the widely-used LWP::UserAgent and can be used completely independently of the object handling modules, if you want to roll your own Neo4j REST interface.
  • The batch API can be used very simply by including the REST::Neo4p::Batch mixin. Visit the link for detailed examples.
  • When paths are returned by CYPHER queries, they are rolled up into their own simple container object REST::Neo4p::Path that collects the nodes and relationships with some convenience methods.
  • You can choose to have REST::Neo4p create property accessors automatically, allowing the following:
     $node->set_property( name => 'Fred' );
     print "Hi, I'm ".$node->name;
    
  • REST::Neo4p::Index allows index configuration (e.g., fulltext lucene) as provided by the Neo4p REST API.

I hope Perlers will give REST::Neo4p a try and find it useful. Again, I appeciate the time you take to report bugs.

Saturday, October 13, 2012

Neo4j at SpringOne 2GX 2012 in Washington, DC


Meet us during this week's (Oct 15-18) SpringOne 2GX 2012 conference in Washington D.C. where we're proudly participating as a Gold Sponsor.

Our entertaining CEO Emil Eifrem and the Spring-savvy Neo4j team consisting of Michael Hunger, Stefan Armbruster and Lasse Westh-Nielsen will be there all week. 

Of course we'd love for you to attend one of our awesome talks at the conference, or just pull us aside for a one on one chat in the hallways or our booth. Can't make the conference? Don't worry, you can join us at a meetup graciously hosted by AOL.

With the scheduled Monday release on of the Spring Data projects there will be a lot of buzz around the  topic of "NOSQL for Spring Developers" which we will naturally participate in. So if you have any questions regarding NOSQL solutions in an enterprise Java context feel free to come along to our booth or stalk us. At the booth we'll also draw the usual iPad and have awesome swag and material to hand out on a first-come first-serve basis.

During the conference the long awaited Spring Data book (Modern Data Access for Enterprise Java) written by the Spring Data project leads and published by O'Reilly will also become available as an e-book.

Talks & Meetups

Session: Wed Oct 17, 2:45 PM: Addressing the Big Data Challenge With a Graph

Emil Eifrem and Michael Hunger

This session will provide a close look at the graph model and will offer best use cases for effective, cost-efficient data storage and accessibility. Takeaways will include an understanding of the graph database model, how it compares to document and relational databases, and why it is best suited for the storage, mapping and querying of connected data. To be included is a hands-on guide to Spring Data Neo4j, which provides straightforward object persistence into the Neo4j graph database.


Session: Thu, Oct 18, 8:30 AM Grails goes Graph

Stefan Armbruster

The past few months have seen the establishment of the Grails Neo4j plugin, which is now in a phase of stabilization and is already used in production. This session will provide a demo covering the usage of Neo4j in multiple setup scenarios, such as embedding and REST. We will show how a domain model is decomposed into Neo4j's node-space, and the potential of graph-specific capabilities like traversals and cypher queries. The session will also offer a case study covering our experiences running the Grails Neo4j combo under high load in a real production system.


Meetup: Tue, Oct 16 7:00 PM, Spring Data Neo4j – Graph Power with Spring

Michael Hunger, Stefan Armbruster
AOL Campus, HQ 5th Floor, 22000 AOL Way, Dulles, VA 20166

Our presentation introduces the different aspects of Spring Data Neo4j and shows applications in several example domains. During the session we walk through the creation of an engaging sample application that starts with the setup and annotating the domain objects. We see the usage of Neo4j Template and the powerful repository abstraction. After deploying the application to a cloud PaaS we execute some interesting query use-cases on the collected data.


Looking forward to see you at any of these occasions and explore the intriguing world of connected data.

Thursday, October 11, 2012

Using (Spring Data) Neo4j for the Hubway Data Challenge

Using Spring Data Neo4j it was incredibly easy to model and import the Hubway Challenge dataset into a Neo4j graph database, to make it available for advanced querying and visualization.

The Challenge and Data

Tonight @graphmaven pointed me to the boston.com article about the Hubway Data Challenge.

Hubway is a bike sharing service which is currently expanding worldwide. In the Data challenge they offer the CSV-data of their 95 Boston stations and about half a million bike rides up until the end of September. The challenge is to provide answers to some posted questions and develop great visualizations (or UI's) for the Hubway data set. The challenge is also supported by MAPC (Metropolitan Area Planning Council).

Getting Started

As midnight had just passed and the Spring Data Neo4j 2.1.0.RELEASE was built inofficially during the day I thought it would be a good exercise to model the data using entities and importing it into Neo4j. So the first step was the domain model, which is pretty straightforward:

Based on the Spring Data book example project, I created the pom.xml with the dependencies (org.springframework.data:spring-data-neo4j:2.1.0.RELEASE) and the Spring application context files.

Import Stations

Starting with the Station in modelling and importing was the easiest. In the entity we have several names, one of which is the unique identifier (terminalName), the station name itself can be searched with a fulltext-index. As hubway also provides geo-information for the stations we use the Neo4j-Spatial index provider to later integrate with spatial searches (near, bounding box etc.)

@NodeEntity
@TypeAlias("Station")
public class Station {
    @GraphId Long id;
    
    @Indexed(numeric = false)
    private Short stationId;
    @Indexed(unique=true)
    private String terminalName;

    @Indexed(indexType = IndexType.FULLTEXT, indexName = "stations")
    private String name;

    boolean installed, locked, temporary;

    double lat, lon;
    @Indexed(indexType = IndexType.POINT, indexName = "locations")
    String wkt;

    protected Station() {
    }

    public Station(Short stationId, String terminalName, String name, 
                   double lat, double lon) {
        this.stationId = stationId;
        this.name = name;
        this.terminalName = terminalName;
        this.lon = lon;
        this.lat = lat;
        this.wkt = String.format("POINT(%f %f)",lon,lat).replace(",",".");
    }
}

I used the JavaCSV library for reading the data files. The importer just creates a Spring contexts and retrieves the service with injected dependencies and declarative transaction management. Then the actual import is as simple as creating entity instances and passing them to the Neo4jTemplate for saving.

ClassPathXmlApplicationContext ctx = new ClassPathXmlApplicationContext("classpath:META-INF/spring/application-context.xml");
ImportService importer = ctx.getBean(ImportService.class);

CsvReader stationsFile = new CsvReader(stationsCsv);
stationsFile.readHeaders();
importer.importStations(stationsFile);
stationsFile.close();


public class ImportService {

    @Autowired private Neo4jTemplate template;

    private final Map stations = new HashMap();

    @Transactional
    public void importStations(CsvReader stationsFile) throws IOException {
        // id,terminalName,name,installed,locked,temporary,lat,lng
        while (stationsFile.readRecord()) {
            Station station = new Station(asShort(stationsFile,"id"),
                                          stationsFile.get("terminalName"), 
                                          stationsFile.get("name"), 
                                          asDouble(stationsFile, "lat"), 
                                          asDouble(stationsFile, "lng"));
            template.save(station);
            stations.put(station.getStationId(), station);
        }
    }
}

Import trips

Importing the trips themselves is only a little more involved. In the modeling of the trip I choose to create a RelationshipEntity called Action to represent the start or end of a trip. That entity connects the trip to a station and holds the date at which it happend. During the import I found a number of data rows to be inconsistent (missing stations), so those were skipped. As half a million entries are a bit too much for a single transaction I split the import up into batches of 5k trips each.

@Transactional
public boolean importTrips(CsvReader trips, int count) throws IOException {
    //"id","status","duration","start_date","start_station_id",
    // "end_date","end_station_id","bike_nr","subscription_type",
    // "zip_code","birth_date","gender"
    while (trips.readRecord()) {
        Station start = findStation(trips, "start_station_id");
        Station end = findStation(trips, "end_station_id");
        if (start==null || end==null) continue;

        Member member = obtainMember(trips);

        Bike bike = obtainBike(trips);

        Trip trip = new Trip(member, bike)
                        .from(start, date(trips.get("start_date")))
                        .to(end, date(trips.get("end_date")));
        template.save(trip);
        count--;
        if (count==0) return true;
    }
    return false;
}

First look at the data

After running the import, after two minutes we have a Neo4j database (227MB) that contains all those connections. I uploaded it to our sample dataset site. Please get a Neo4j server and put the content of the zip-file into data/graph.db then it is easy to visualize the graph and run some interesting queries. I list a few but those should only be seen as a starting point, feel free to explore and find new and interesting insights.

Stations most often used by a user

START n=node(205) 
 MATCH n-[:TRIP]->(t)-[:`START`|END]->stat 
 RETURN stat.name,count(*) 
 ORDER BY count(*) desc LIMIT 5; 

+------------------------------------------------+
| stat.name                           | count(*) |
+------------------------------------------------+
| "South Station - 700 Atlantic Ave." | 22       |
| "Post Office Square"                | 21       |
| "TD Garden - Legends Way"           | 10       |
| "Boylston St. at Arlington St."     | 5        |
| "Rowes Wharf - Atlantic Ave"        | 5        |
+------------------------------------------------+
5 rows
31 ms 

Most beloved bikes

START bike=node:Bike("bikeId:*") 
  MATCH bike<-[:BIKE]->trip 
  RETURN bike.bikeId,count(*) 
  ORDER BY count(*) DESC LIMIT 5;

+------------------------+
| bike.bikeId | count(*) |
+------------------------+
| "B00145"    | 1074     |
| "B00114"    | 1065     |
| "B00538"    | 1061     |
| "B00490"    | 1059     |
| "B00401"    | 1057     |
+------------------------+
5 rows
2906 ms

Heroku

The data can also be easily added to a Heroku Neo4j Add-On and from there you can use any programming language and rendering framework (d3, jsplumb, raphael, processing) to visualize the dataset.

What's next

Next steps for us are to import the supplied shapefile for Boston and the stations as well into the Neo4j database and connect them with the data and create a cool visualization. I rely on @maxdemarzi for it to be awesome. Another path to follow is to craft more advanced cypher queries for exploring the dataset and making them and their results available.

Boston Hubway Data-Challenge Hackaton

Hubway will host a Hack Day at The Bocoup Loft in Downtown Boston on Saturday, October 27, 2012. Register here and spread some graph love.


The Source-Code is available here on GitHub and Max de Marzi wrote a great follow-up post visualizing the results.

Friday, October 5, 2012

Follow The Data - FEC Campaign Data Challenge


In politics, people are often advised to "follow the money" to understand the forces influencing decisions. As engineers, we know we can do that and more by following the data.

Inspired by some innovative work by Dave Fauth, a Washington DC data analyst, we arranged a workshop to use FEC Campaign data that had been imported into Neo4j.

FEC Campaign Finance Data

Every Sunday of every year, the FEC updates campaign finance data sets for the current two-year election period plus the most recent five (5) two-year election periods. The data sets include:
  • all individuals registered as candidates for President, House, or Senate
  • all registered committees engaged in political fundraising
  • all individual contributions greater than $200
In addition, there are extra files concerning transactions between committees and then some for associating some records (ooh look, relationships!).

After exploring some evolutionary import strategies (starting with the most direct, then iterating), we settled on an approach which structured the data to look like this:
Campaign Finance Data in a Graph

Query Challenge

With the data imported, and a basic understanding of the domain model, we then challenged people to write Cypher queries to answer the following questions:
  1. All presidential candidates for 2012
  2. Most mythical presidential candidate
  3. Top 10 Presidential candidates according to number of campaign committees
  4. Find President Barack Obama
  5. Lookup Obama by his candidate ID
  6. Find Presidential Candidate Mitt Romney
  7. Lookup Romney by his candidate ID
  8. Find the shortest path of funding between Obama and Romney
  9. List the 10 top individual contributions to Obama
  10. List the 10 top individual contributions to Romney
Care to give the challenge a try? OK, then follow the steps on the github project site to clone the importers. You'll want to run the related importer like so:

./bin/fec2graph --force --importer=RELATED

Then just start up Neo4j and open a browser to http://localhost:7474 to query away. If you're new to Cypher read through the Neo4j Manual Section on Cypher to learn the basics of querying a graph.

Submit the queries to me andreas@neotechnology.com by next Thursday and we'll pick a winner from the correct entries. Prize? A free pass to GraphConnect of course! Coming this November 5 & 6 in San Francisco, GraphConnect is a fantastic conference devoted to graph databases.

Want a hint? 

Alrighty. Let's take a look at #2. After successfully listing all candidates for the first query, you could page through the listing to look for names that seem.. just off. Use limit and skip in the return clause to page through the long listing:

start candidate=node:candidates('CAND_ID:*') 
where candidate.CAND_OFFICE='{fill this in}' AND candidate.CAND_ELECTION_YR='{this too}' 
return candidate.CAND_NAME skip 100 limit 100;

Once you spot one of the many candidate names that isn't real, you can query for it directly:
start candidate=node:candidates(CAND_NAME:'CLAUS, SANTA')
return candidate;

Cypher Masters

From our recent workshop, the winners are:
  • Matt Tyndal
  • Lou Kosak
  • Pengchao Wang
Congratulations, and thanks to everyone who joined us for the event. With the announcement of next week's winner we will include solutions to the challenge. Good luck!

Always,
Andreas

Tuesday, October 2, 2012

Neo4j 1.8 Release - Fluent Graph Literacy

Available immediately, Neo4j 1.8 offers a delightful experience for reading and writing graph data with the simple expressiveness of the Cypher language. Whether you're just discovering the social power of Facebook's Open Graph or are building your own Knowledge Graph for Master Data Management, speaking in graph is easy with Cypher. It is the key to making sense of data with Neo4j.
Consider some everyday examples about books.

What to read?

Through a lovely summer of lounging around you've read through a stack of paperbacks. The starting academic season has you in a more serious mood, so you browse through the classics you never actually read in college wondering which is worth catching up on. You can narrow the list, but not decide. Generally, our friends make good recommendations since we tend to associate with similar-minded people. Ask them.
"Well, starting with me, go to each of my friends to check whether they liked Infinite Jest, On the Road, or The Right Stuff, then pick the most liked."
Cypher was conceived of as short-hand for answering questions just like that: a place to start, a "path" to traverse, then some values to peek at. The book recommendation could look like this:
START me= nodes:node_auto_index(name="Andreas")
MATCH (me)-[:knows]->(friends)-[r:likes]->(books) 
WHERE books.title in["Infinite Jest", "On the Road", "The Right Stuff"] 
RETURN books.title, count(r)
Reading carefully through that, you'll easily be able to decipher the meaning. The basics of reading data with Cypher compose nicely into larger, rich queries. Read even more over in Neo4j's online manual to learn the details.

Changing the story

Somehow, magically, information has to exist before you can ask for it. We like to say that Neo4j is whiteboard friendly — what you draw is what you store. With Cypher, data becomes normal sounding statements, particularly when compared to relational databases. I've got fairly nerdy friends, yet none of them declare that they have a foreign key relationship with ISBN 0316066524. They pretty much just say they like "Infinite Jest."
To create data about myself, my friend Heather, a book, our friendship and her appreciation of the book:
CREATE (me {name:"Andreas"}), (heather {name:"Heather Yorkston"}), 
(jest {title:"Infinite Jest"}),
(me)-[:knows]->(heather), (heather)-[:likes]->(jest)
We could keep track of how much people liked a particular book by adding a star rating, like so:
start heather=nodes:node_auto_index(name='Heather Yorkston') 
match (heather)-[r:likes]->(book) where book.title="Infinite Jest" 
SET r.rating=5
If Heather tells me that she has never even read the book, we can remove the "liked" like this:
start heather=nodes:node_auto_index(name='Heather Yorkston') 
match (heather)-[r:likes]->(book) where book.title="Infinite Jest" 
DELETE r
That's a small sample of how Cypher performs basic database operations: creating, reading, updating and deleting. There's much more to delve into. Get comfy, then read up on Cypher in Neo4j's online manual.

Fine Print

This release of Neo4j includes the following highlights:
  • Zero-downtime rolling upgrades in HA clusters, for nicer administrative ops
  • Streamed responses to REST API requests, for faster remote access
  • Bi-directional traversals, branch state and path expanders in the traversal framework, for even faster queries
  • Support in the Cypher language for writing graph data and updating auto-indexes, see above ;)
  • Support for explicit transactions in neo4j-shell, on the command line and through the web
For a full summary of changes in this release, please review the change-log, which is also included in the download. Even better, join me for a live webinar to review all the latest features of Neo4j 1.8.

Upgrading

As an incremental service release, Neo4j 1.8 builds upon the previous 1.6 and 1.7 releases, with full backward compatibility. It does not require any explicit upgrade to persistent stores created using Neo4j 1.6 and 1.7 installations. Please see the deployment section of the manual for more detail, including instructions for upgrading installations running Neo4j versions released before 1.6.

Get Neo4j 1.8

Neo4j 1.8 is available for:
Really, I do hope you join us at GraphConnect in November. It will be a great time and I'd love to chat with you in person about the joy of graphs.

Always,
Andreas

Monday, October 1, 2012

MYOB Neo4J Coding Competition


Guest Author: Aidan Casey, Solutions architect, MYOB Australia

Last week marked the end of the MYOB Neo4J coding competition. This was an internal competition for the development team in the Accountants Division of MYOB, to develop a customer relationship system for accountants using node.js and Neo4J. MYOB is one of the largest ISV in Australia and the team in the Accountants Division are focused on developing line of business applications for accounting practices.

A coding competition with a difference!
I wanted to have a level playing field for the competition so what better to throw at a bunch of Microsoft developers than a Neo4J, Node.js and Heroku challenge! The competition ran for 8 weeks and the challenge was to build an online CRM system that ingested a bunch of text files that represented data from a typical accounting practice. The business domain was very familiar to the team but the technologies were all new.
To add another twist, points were awarded to the people within the team that made the biggest community contributions over the 8 weeks (MYOB ‘brown bag’ webinar sessions, yammer discussion threads and gists on GitHub). I wanted this to be a very open open-source competition!

Why Neo4J?
When you dig deeper and analyse the data that an accounting practice uses it’s all based around relationships – an accounting practice has employees, employees manage a bunch of clients, and these clients are often related (husband and wife, family trust etc). The competition gave the team a chance to dip their toes into the world graph databases and to see how naturally we could work with the data structures.

And the winner is .... Safwan Kamarrudin!
I’m pleased to announce that Safwan Kamarrudin is the winner and proud owner of a new iPad! Safwan’s solution entitled “Smörgåsbord” pulled together some really cool node.js modules including the awesome node-neo4j, socket.io and async. Safwan made a massive contribution to the competition community through the use of yammer posts, publishing GitHub Gists and by running brownbag sessions here in the office.


Accountants Division program manager Graham Edmeads presenting Safwan with his prize!

An interview with the winner!

Qn – So where did you come up with the name “Smörgåsbord”, are you a big fan of cold meat and smelly cheese?
I chose the name because the competition asked contestants to use a smorgasbord of technologies. Plus, I thought it would be cool to have umlauts in the name.

Qn – Where can we find your solution on GitHub?

Qn – Complete this sentence – Neo4J is completely awesome because ….
Data is becoming more inter-connected and social nowadays. While “relational” databases can be used to build such systems, they are definitely not the right tool for the job due to their one-size-fits-all nature (despite the name, relational databases are anything but relational). Modelling inter-connected data requires a database that is by nature relational and schema-free, not to mention scalable! And in the land of graph databases, in my opinion there is no database technology that even comes close to Neo4J in terms of its features, community and development model.

Qn – What in your opinion is the biggest challenge to wrapping your head around Graph database concepts?
For someone who is more used to relational databases, the differences between nodes and tables need some getting used to. In a graph database, all nodes are essentially different and independent of each other. They just happen to belong to some indices or are related to other nodes.
This also relates to the fact that nodes of a similar type may not have a fixed schema, which can be good or bad depending on how you look at it.
Another subject that I had to grapple with was whether it makes sense to denormalize data in Neo4J. In a NoSQL database, normalization has no meaning per se. In some cases, data normalization even negates the benefits of NoSQL. Specifically, many NoSQL databases don't have the concept of joins, so normalizing data entails having to make multiple round trips to the database from the application tier or resorting to some sort of map-reduce routine, which is inefficient and over engineered. Moreover, normalization assumes that there's a common schema shared between different types of entities, and having a fixed schema is antithetical to NoSQL.

Finally a word of thanks!
I’d like to say a huge thanks to Jim Webber, Chief Scientist at Neo Technologies for helping me launch the coding competition. Jim was struck down with chicken pox just hours before the competition was launched but he still managed to join me online to launch it and take the team through the infamous Dr Who use case. You are a legend Jim, many thanks!