Tuesday, July 31, 2012

Neo4j with JPA or JDO

The DataNucleus project provides Java applications with a consistent, standards-compliant platform for data management. To a broad range of data storage, DataNucleus has just added Neo4j support with the release of datanucleus-neo4j.

The release of the DataNucleus v3.1.0-m1 of the "datanucleus-neo4j" plugin includes support for embedded and nested embedded fields and the querying of those fields, support for use of Neo4j node id as a global object identity, a check for duplicated identity on persist, support for JDOQL/JPQL ordering in Cypher, and much more.

Read all the details about datanucleus-neo4j in the official blog post, then follow along with either the JPA Tutorial or the JDO Tutorial.

Monday, July 23, 2012

How Neo4j Achieves High Availability


Neo4j's High Availability solution provides replication across a cluster of machines, for maximum read scaling and reliable uptime. Chris Gioran takes a detailed look at Neo4j's approach for managing master election in this excellent blog post.

Friday, July 13, 2012

Neo4j 1.8.M06 - Rolling Upgrades

Neo4j 1.8 Milestone 6 covers all major improvements of the 1.8 roadmap. Among the usual tweaks and updates, this milestone provides a welcome feature for operations engineers – rolling upgrades across a cluster.

Rolling Upgrades

There is a subtle operational challenge when managing database upgrades over a cluster. We chatted with the ever clever Chris Gioran about rolling upgrades:

ABK: So Chris, what prompted the development of rolling upgrades?
CG: What we’re trying to achieve is, when you have an HA cluster that runs on a capable version — starting from 1.5.3 onwards, including the 1.6 and 1.7 series — the exercise is to upgrade everything without disturbing the operation of the cluster. The cluster will upgrade, while continuing to serve requests from either slaves or masters.
ABK: Can’t this be done today by just  upgrading one instance at a time, leaving the rest running?
CG: Not necessarily.
ABK: What’s the problem with that?
CG: The problem is when we have breaking changes in the protocol used to communicate between instances. For example, going from 1.5.3 to 1.7, it’s not possible to have a slave on 1.7 talking to a 1.5 master (or vice versa) because we’ve made changes for performance and stability to the protocol itself.
ABK: With rolling upgrades, each of these different versions, though speaking different protocols, will gracefully coordinate?
CG: Yes.
ABK: Describe how that actually happens.
CG: So the rolling upgrade, actually, works exactly as you’d expect an upgrade would work. If there are not breaking changes between versions, you normally begin with the slaves, powering down, copying the store, migrating configuration if needed, then bringing that server back up. The new version would take over, communicate with the rest of the cluster and you wouldn’t notice anything.
A rolling upgrade offers that with versions that have incompatible protocols. Each slave, as it is brought up, detects the version running in the cluster and gracefully falls back into a compatibility mode that doesn’t allow it to become master, but allows it to continue to execute transactions.
ABK: Does order matter?
CG: Ordering does matter. It won’t break things, but it is better to start with the slaves. We’ve defined the point where the cluster as a whole has an upgraded version, so the moment that master switch happens it switches from the old version to the new version. You leave the master as the last machine running the old version. When you bring that down then a new version will become master. The rest of the slaves will detect that, then will roll forward to the new version, and continue operating.
ABK: That sounds great. And all the way back to 1.5.3. This is fantastic. Thanks so much for explaining this, Chris.
CG: Happy to make things work.

Notable Changes

Kernel:
  • Deprecated AbstractGraphDatabase.transactionRunning()
  • Changed synchronization of applying transactions to prevent a deadlock scenario
  • Original cause can be extracted from a transaction RollbackException
Server:
  • Fixed issue that stopped the server from starting without the UDC-jars.
Cypher:
  • Fixed problem when graph elements are deleted multiple times in the same query
  • Fixed #625: Values passed through WITH have the wrong type
  • Fixed #654: Some paths are returned the wrong way
HA:
  • Added transaction push factor that can be configured with number of slaves to which a transaction should be pushed. The master will optimistically push each transaction before tx.finish completes to reduce risk of branched data.
  • Added the ability for rolling upgrades from versions 1.5.3 onwards.
  • Changed the way master election notification and data gathering works, leading to massively reduced writing of data to the zookeeper service and a subsequent performance increase.

Get Neo4j 1.8.M06

Neo4j 1.8.M06 is available for:
Cheers,
the Neo4j Team

Friday, July 6, 2012

Cypher JDBC Tools Testing Results

After publishing the call for helping us testing the Neo4j JDBC driver in the wild two weeks ago.
Ralf Becher from TIQView stepped up and tested a LOT of JDBC tools with the driver. Thanks a lot for this engagement Ralf!
Luanne Coutinho, the winner of our Heroku Contest and Michael Wilmes also took up the challenge of testing the JDBC-driver with real-world tools. Thank you!
I had some fun but not so much success testing some command line JDBC clients with the driver.
The complete list of tests is in the published google spreadsheet.
The following is a quite impressive list, feel free to add you favorite JDBC-tool to this list by following the instructions on our Call-To-Action post.

Eclipse BIRT


Works very well, you just need to point out the driver in the driver setup, and the edit the SQL queries in the DataSet view to be Cypher queries. I tested against a local Neo4j Server instance.
Peter Neubauer

SQuirrel SQL v.3.3.0


All Cypher queries did work very well.
Ralf Becher Link to Blog Post

Aqua Data Studio 4.7.2

(Download)

Tested with normal read queries as well as delete queries. Works. No optional TYPE"/"HAS_PROPERTY" nodes in queried DB. Therefore no browsing of DB scheme possible. Application handles well. No error message. Tree structure of DB view just stays blank.
Michael Wilmes

QlikView


Works well, detailed explanations and examples with queries, visualization and transformation in the blog post.
Ralf Becher

Pentaho Kettle Dataintegration


Did work except RETURN of node or relationship causes an exception. Detailed explaination and example in the blog post.
Ralf Becher

SQL Query Plugin for IntelliJ


Worked pretty well, was able to execute a couple of queries including mutating ones- this is really helpful. The schema browser was understandably not very interesting. Tried a couple of data exports- html, csv and xml- samples uploaded to dropbox. Do the autocommit settings in these tools apply? In case they do, I played around with them without much success. For the default=true, I created a node, and then queried it by ID- got a SQLException. Only if I explicitly commit does the node get returned. Then turned autocommit off, created a node- same thing- unable to query it till I commit. Also unable to update a property till I commit the create. After the commits, I updated a property, and rolled back (also tried disconnecting, killing IntelliJ)- but the update had been committed anyway.
Luanne Link to screenshots and data files.

Ataccama DQ Analyzer


The Ataccama DQ Analyzer works well with Neo4j. It gives us the possibillity to do a data profiling on graph data, more details in the blog post.
Ralf Becher

SQLExplorer


Works smoothly. Details and many screenshots in the blog post.
Ralf Becher

DbVisualizer


Worked well (with a customer connection implementation for the database metadata).
Rickard Öberg Link to Blog Post

LibreOffice


When it comes to using a database as a reporting tool, one of the simplest thing you can do is use one of the Office packages and connect to a database and use the data for charts and spreadsheets. Since LibreOffice has pretty good JDBC connectivity I tried it out, and here’s the result
Rickard Öberg Link to Blog Post

ODBC in Windows


While having a JDBC driver is great, not all tools that work with databases use JDBC. Some use ODBC instead, and since there is a ODBC-JDBC Gateway available from Easysoft I wanted to try this out. After installing this software it was really easy to set up a connection, and then connect to it using a standard ODBC tool.
Rickard Öberg Link to Blog Post

IntelliJ


Lastly I tried using the JDBC driver with IntelliJ, my Java IDE of choice. This worked out really well, and with some configuration it even allows me to enter values for parameterized queries, which is nice.
Rickard Öberg Link to Blog Post

HenPlus


bash

Didn't work, got as far as registering the driver, but it seems to parse SQL so it didn't execute the cypher statement and instead tried to load a file
Michael Hunger

jisql


jisql
It did not work, it tries to parse the sql and so doesn't execute cypher. Had to fix the broken start script first and even then it just hang in the prompt.
Michael Hunger

SQL Workbench


SQL Workbench

Works very well, simple setup, easy to use.
Michael Hunger

SQLShell (clapper)


SQLShell

First of all, it works! A commandline shell written in scala, hosted on github but with a graphical installer :)
I installed it to a local directory, added neo4j-jdbc.jar to the lib directory and set up the config file.
[drivers]
# Driver aliases.
neo4j = org.neo4j.jdbc.Driver

[db_neo4j]
aliases: neo4j
url: jdbc:neo4j://localhost:7474
driver: neo4j
user: 
password: 
history: $vars.historyDir/neo4j.hist


Then you start it with: sh bin/sqlshell -c config.cfg neo4j


SQLShell, version 0.8.1 (2012/03/16 09:43:31)
Copyright (c) 2009-2011 Brian M. Clapper
Using JLine
Type ""help"" for help. Type "".about"" for more information.

sqlshell> start n=node(0) return n
Executing query: start n=node(0) return n
 with params{}
Starting the internal [HTTP/1.1] client
Execution time: 0.792 seconds
Retrieval time: 0.1 seconds
1 row returned.

n                      
-----------------------
{_node_id=0, name=root}

sqlshell> start n=node(*) return n limit 20
Executing query: start n=node(*) return n limit 20
 with params{}
Execution time: 0.770 seconds
Retrieval time: 0.13 seconds
20 rows returned.
The only drawback so far is that the tool destroys the terminal, so you have to do a "reset" afterwards.
Michael Hunger