Wednesday, August 24, 2011

Graph Processing v. Graph Databases - Jim Webber

Our Chief Scientist and speaker extraordinaire Jim Webber just posted a blog post on graph processing.

He addresses big data, and Hadoop and Pregel's tendency toward data analytics, OLAP. Neo4j, on the other hand, "optimizes storage and querying of connected data for online transaction processing, OLTP, scenarios."



He then begins to identify the advantages to using each tool, and concludes:

"If you need OLTP and deep insight (OLAP-style) in near real-time at enterprise scale then Neo4j is a sensible choice. For niche problems where you can afford high latency in exchange for higher throughput, then the graph processing platforms like Pregel or Hadoop could be beneficial. But it’s important to understand that they are not the same."

Check out the full post here, and let us know what you all think!

Friday, August 19, 2011

Spring Data Graph 1.1.0 Released

Spring Data Graph 1.1.0 with Neo4j support released

After the first public release of Spring Data Graph in April 2011 we mainly focused on user feedback. With the improved documentation around the tooling and an upgraded AspectJ version we addressed many of the AspectJ issues that where reported by users. With the latest STS and Eclipse and hopefully with Idea11 it is possible to develop Spring Data Graph applications without the red wiggles. To further ease the development we also provided sample build scripts for ant/ivy and a plugin for gradle. Of course we kept pace with development of Neo4j, currently using the latest stable release of Neo4j (1.4.1). During the last months of Neo4j development the improved querying (Cypher, Gremlin) support was one of the important aspects. So we strove to support it on all levels. Now, it is possible to execute Cypher queries from Spring Data Graph Repositories, from the Neo4j-Template but also as part of dynamic field annotations and via the introduced entity methods. The same goes for Gremlin scripts. What's possible with this new expressive power? Let's take a look. For example, in a repository:
	
public interface PersonRepository extends GraphRepository<person>, NamedIndexRepository<person> {
	
	    @Query("start team=(%team) match (team)-[:persons]->(member) return member")
	    Iterable<person> findAllTeamMembers(@Param("team") Group team);
	
	    @Query(value = "g.v(team).out('persons')", type = QueryType.Gremlin)
	    Iterable<person> findAllTeamMembersGremlin(@Param("team") Group team);
	}
The Neo4j Template API got a complete overhaul which resulted in much fewer, more focused methods. The advanced query result handling capabilities (type conversion, mapping, single results, handler, etc.) are now implemented using a more fluent API. This new API is available for all types of queries, whether index lookups, graph traversals, Cypher queries or Gremlin scripts.
	template.query("start n=(0) match n-->m return m", null).to(Node.class);

	template.execute("g.v(0).out", null).to(Node.class);

	template.lookup("relationship", "name", "rel1").to(String.class, new PropertyContainerNameConverter()).single();

	template.traverse(referenceNode, traversalDescription).handle(new Handler<path>() {
	            public void handle(Path value) {
	                final String name = (String) value.endNode().getProperty("name", "");
	                resultSet.add(name);
	            }});
The REST API wrapper also got an internal refreshment and added support for querying Cypher and Gremlin remotely. This makes both capabilities also available for running the object graph mapping and the Neo4j Template against a remote Neo4j-REST-Server. Many thanks to the community for the valueable feedback, the code contributions and discussions. The collaboration between the SpringSource and Neo-Technology teams was enjoyable, as always. Please check out the current release from Maven Central or from SpringSource.org. If you would like to discuss the Spring Data Graph project, make sure to visit the Spring Forums. We host the project publicly on github for you to fork, comment and contribute.

We want to give you a few glimpses of the future roadmap.

We are going to host a webinar on Sept. 8th to give a quick intro to Spring Data Graph. Spring Data Graph will be rebranded to "Spring Data Neo4j" as this is what it is about: "Support for the Neo4j Graph Database in a SpringFramework environment." This already cast its first signs in the changed package structures. We will focus on an additional mapping-based implementation that also works without AspectJ. Another major focus will be the remote REST-API which becomes more and more important with the availability of hosted Neo4j services at PaaS providers. The Spring Data Graph Guide Book will be published as InfoQ Mini Book and available as printed version at the Spring One conference in October. Neo Technology will be present at Spring One to talk about NOSQL, Graph Databases and Spring Data Neo4j. We also hope to contribute some unexpected events and technologies to the conference. So stay tuned.

Monday, August 15, 2011

Recap: Getting Started with Neo4j | Webinar Series #2

Hey everyone

For all of you who weren't able to make our webinar, Getting Started with Neo4j, we've edited the webinar, and have a summary of our Q&A session.

Of course, for any other questions, feel free to subscribe to our user list. We answer support questions extremely fast.



Q: Can you go over more complex specific querying in the HTTP console?
A: Michael Hunger hosts a more in-depth exploration of querying with Cypher in this Neo4j Screencast. Also, the Neo4j Manual has a good section on Cypher.
For Gremlin scripting, TinkerPop has a ton of great information on the project website.

Q: How does auto indexer work?
A: The auto-indexer can be configured to add indexing entries based on specific node properties. When a node with an auto-index property is saved, an index entry will be saved for it. For more details, see the Neo4j Manual section on Auto Indexing.

Q: What about SPARQL?
A: SPARQL is a query language specifically for RDF, which is a particular type of graph. You can use Neo4j as an RDF store, and then run SPARQL queries against it, by using TinkerPop. Davey Suvee has written a nice blog post that does just that.

Q: And where is the GA version?
A: The latest stable version of Neo4j (our GA, or General Availability release) and milestone version (interim development release) are always available from http://neo4j.org/download/.

Q: How do I run webadmin using High Availability version?
A: Webadmin is always available from any edition of Neo4j - Community, Advanced or Enterprise. In High Availability mode, each data node in the cluster will have a webadmin available for monitoring.

Q: What is the diff between milestone vs stable version?
A: A milestone is an interim release that has been quality tested, but contains features which may still change before a stable release. Milestones occur roughly every two weeks.

Our stable releases are marked as GA for General Availability, and vary between 3 to 4 months of development. Our minor increments (1.4 to 1.5) that are released as GA are scoped for including specific features, while the milestones are really snapshots in time along the way towards that stable release.


Q: Webadmin server needs to start first manually, doesn't it?
A: There was a time when webadmin was a standalone entity, but now it starts up with the server. When the server is ready to serve data, webadmin is ready to display it.

Q: What about zoom in zoom out, leveling type of graphs?
A: Webadmin isn’t a full featured graph visualization tool, though we will continue to development incremental improvements. We consider it almost more of a debugging tool, to easily explore the details of a graph. For more elaborate graph visualization take a look at something like Gephi.

Q: Do you plan to identify, present the added value complementary product around Neo4j?
A: There is a vibrant ecosystem of tools and libraries built around Neo4j. We will indeed be highlighting the more prominent projects on our website and would also consider special blogs, meetups, or webinars about them. Let us know if there’s something you’d be interested in hearing about.

Q: How do you manage from a graph perspective, node or relationship aggregation, node removal...?
A: Aggregation operations are possible with both Cypher and Gremlin. Again, check out the Neo4j Manual or the Gremlin wiki. Deleting is a single node operation; while you could script it with Gremlin, there isn’t at the moment a “delete all” operation for deleting sets of nodes at once.

Q: Is it possible to specify relationships between relationships? For example, saying that "KNOWS" relationship is like "RECOGNIZES", so the query returns results from both KNOWS and RECOGNIZES?
A: In Neo4j’s “property graph” model, relationships do have a type, but it isn’t a type system. The “type” of a relationship is really a simple named tag, or label. This is an interesting suggestion and we’ll consider whether to support it in the future.

Q: How hard is it to embed neo4j in an android app? I've tried quite hard but haven't had success.
A: There was some recent discussion about this on the mailing list. Have a look at the discussion here: http://bit.ly/oyb6pb

Q: How does clustering handle simultaneous writes?
A: Writes are always serialized through the master. So, multiple concurrent transactions compete for a write lock, executing one at a time. More details can be found in the Neo4j Manual section on transactions.

Q: How can we ensure the transactions are in sync with DB if we have graph database on top of oracle DB?
A: Do you mean storing data in Neo4j, with references to data stored in an Oracle DB? Neo4j can be used with an external transaction manager. Chris Gioran has a nice blog post where he explores that. Read about it here: http://bit.ly/ntaU3W

Q: Can you show some example for multi-relation?
A: Adding multiple relationships is just as easy as adding a single relationship -- just keep adding relationships, even of the same type (perhaps differing in properties). As an example within a social domain, consider:

(andreas)-[:KNOWS]->(peter)
(andreas)-[:WORKS-WITH]->(peter)
(andreas)-[:LIKES]->(bacon)

Thursday, August 4, 2011

Heroku Neo4j Add-On Available in Private Beta

Hello Graphistas and Rubyists!

Now Heroku has become a language polyglot platform , we’re happy to announce that we’re enhancing the NOSQL ecosystem for Heroku customers and users by releasing the Neo4j-Graph Database Add-On.

The Neo4j-Graph Database Add-On became “public” this week, so it’s available to all registered beta-testers of the Heroku PAAS platform. This is the first step in our efforts to provide hosted Neo4j Servers to a number of PAAS providers. And the Neo4j Add-On is currently available for free with the test plan, which we think is pretty cool.

+  =

What is a Graph Database ?

A graph database is a type of NOSQL datastore suited for interconnected data. It stores its content as nodes connected via relationships, both of which can have any number of properties (this particular model is known as a property-graph).

Graph databases have constant time query characteristics for local data access which is independent of data set size. They are a perfect fit for storing object networks but don't rely on fixed schemas, so highly dynamic domains are more than welcome (in both dimensions - properties and connections).

In terms of scalability, Neo4j currently scales up well since graphs tend to require far fewer writes than traditional RDBMS to achieve the same informational goals. Neo4j scales out reasonably too (especially for reads) through master/slave replication (where slaves are writable). An instance of Neo4j can handle billions of nodes and relationships on a single machine typically surpasses other technologies (like RDBMS) when executing the graph equivalent of multi-join queries.

Provisioning a Neo4j Server instance

For the Heroko Neo4j Add-On we've created our own hosting infrastructure on AWS EC2 instances co-located in the same region (us-east). As a Heroku user, you can add Neo4j to you application by simply issuing the following command:
> heroku addons:add neo4j:test
A Neo4j Server will then be provisioned for your application (with currently generous amounts of RAM). You may visit the heroku page for your add-on to see the following information.

Neo4j Add-On Settings


Most of this connection information is also available via
> heroku config.

The first link points to the Neo4j Web Administration UI which allows you to visualize and manage certain aspects of your graph database.

The second link (REST-URL) is used by your application to connect to the Neo4j Server.
(Please note that unlike the Neo4j Server available from http://neo4j.org/download this setup requires a username and password for basic authentication).

The Backup & Restore feature allows to you to pull the current content of your database anytime (suspending the server meanwhile) and replacing the available content with alternative data-sets.
For your convenience we began to provide datasets and will add more of them in the future so that you can start to explore larger graphs right away.

Please note that for saving capacity we suspend idle instances after 24 hours of inactivity. The fist request of a suspended instance will take longer than usual as the instance has to be resumed.

Your first Application

The simplest setup for a application using the Neo4j-Server would be using a REST library like rest-client. Anyway the usage with rest-client would look like this.

irb> p JSON.parse(RestClient.get ENV['NEO4J_URL'])

Remember that instances of Neo4j are suspended after periods of inactivity, so the first call to an instance may take a little longer than normal as the instance is resumed.

A gem like neography encapsulates all the low-level details of the API and provides a clean object-oriented way of interacting with the Neo4j-Server.

# i_love_you.rb
 require 'sinatra'
 require 'neography'

 neo = Neography::Rest.new(ENV['NEO4J_URL'] || "http://localhost:7474")
 
 def create_graph(neo)
   # procedural API
   me = neo.get_root
   pr = neo.get_node_properties(me)
   return if pr && pr['name']
   neo.set_node_properties(me,{"name" => "I"})
   you = neo.create_node("name" => "you")
   neo.create_relationship("love", me, you)
 end
 
 create_graph(neo)
 
 get '/' do
   ## object-oriented API
   me = Neography::Node.load(neo, 0)       
   rel = me.rels(:love).first
   you = rel.end_node
   "#{me.name} #{rel.rel_type} #{you.name}"
 end

(J)Ruby Server Side Extensions

The Neo4j Heroku Add-On contains one more new piece of functionality that can be added to the Neo4j-Server: the ability to extend behavior of the server in other languages than Java.

Since we want to support efficient execution of code written in dynamic programming languages like Ruby, we've provided a means to move code to the server where it is executed performantly close to the underlying database. We employ JRuby on the server to run rack-applications packaged as ruby gems (for easier management versioning).

Those rack applications can use gems like neo4j.rb to access Neo4j directly without any intermediate or remote layers. This allows a more granular, batch oriented domain level REST API to your front-end providing (or consuming) all the information that has to be exchanged with the graph database in one
go.

A simple example of an application split into a persistence back-end and a front-end hosted on heroku is available on the Heroku documentation page or on the Neo4j wiki. We also made the sample code available on github.

Documentation Galore

Everything you read here and even more details are available on our Heroku Neo4j Add-On documentation page. This page is password protected as part of the private beta process. But you can read most of the documentation on the Neo4j Wiki as well.

Tuesday, August 2, 2011

Neo4j 1.4.1 “Kiruna Stol” GA

In the last few weeks since we announced Neo4j 1.4 GA, we’ve been busy working on improvements to the codebase for more predictability, better backup performance, and improved scripts for the server. Ordinarily we’d roll these improvements into a milestone, but this time around we think they’re important enough to warrant a stable release, and so today we’re announcing the release of Neo4j 1.4.1 GA.

Predictable commit semantics

When working with indexes, there had been some confusion about when index data would become visible with respect to the corresponding graph data. In this release we’ve taken a firm stance on predictability so that in a two-phase commit, the graph datasource will always commit first, and then the index providers.

Large backup support

In previous versions of Neo4j, very large online backups over the course of many hours could cause the online backup tool support to throw out of memory errors, making for an inconvenient backup process. We’ve hardened the online backup tool now, and made chunk size and client read timeout configurable, so things should be much smoother.

Server scripts made more cross-platform

In the 1.4 GA release we removed the 3rd party server wrappers from the codebase since they’d caused so much pain. Instead, we provided bash scripts and batch files to run the Neo4j server. Even though we thought we had some leet bash skills, it turns out that some of the scripts we’d written didn’t work so well with some bash variants. This time around we’re confident that our server management scripts will work on pretty much any environment, so give them a spin.

Bug fixes and improvements

A big thanks to our community for finding and reporting their experiences with the database. Because of these efforts, we’ve fixed some bugs and annoyances in this release including fixing up relationship counts, a possible null pointer exception when adding properties, and dealing with the intricacies of file handling on different operating system and file system combinations. And we took the opportunity to improve our logging for critical exceptions within the transaction manager.

Good news, everyone!

The 1.4.1 GA release is now available from the Neo4j web site, and as always your feedback is welcomed on the mailing list. Download and happy hacking!