Saturday, August 25, 2012

At a conference? Need a dataset? Neo4j at NOSQL NOW

For the "Lunch and Learn around Neo4j" with Andreas Kollegger we wanted to use a dataset that is easy to understand and interesting enough for attendees of the conference.

So we chose to use just that days conference program as dataset. Conference data is usually well connected and has the opportunity for challenging data model discussions and insightful queries.

So we set up a Heroku instance, connected to a provisioned Neo4j database hosting an informational website. It explains Neo4j, the local installation, the Heroku add-on and lists available drivers for the different languages.
We then used a small ruby script with the neography gem by our community rock star Max De Marzi
to populate the database. From our example-data site, you can download the graph.db directory for your local Neo4j server.

Andreas ran a very successful session working with the conference dataset, here are the slides introducing Neo4j and Cypher:

To spark your creativity we also prepared some more advanced queries running with the dataset and made them available. You can access the server web interface with an interactive console running the queries and a data browser for visualizing the available data.


Have fun!

Friday, August 24, 2012

New GraphConnect winner from SV Java User Group Meetup

Hi yall,

I'm back! With another meetup winner of GraphConnect passes. A couple of weeks ago, GraphConnect sponsored the Silicon Valley Java User Group, where about 200 attendees showed up for a discussion about REST API's at Google HQ in Mountain View.

There was tons of pizza, Fat Tire beer and cookies to be had by all, as well as a lot of enthusiasm for the free GraphConnect t-shirts.

Congrats Arvid Kumar, for the free pass to GraphConnect!

Tuesday, August 21, 2012

Neo4j "Track & Hack" at JRubyConf.EU

I'm super-excited having been able to sponsor, attend, speak and "Track & Hack" at JRubyConf.EU. It has been a great time at the event.

With the "track & hack" project we wanted to create an open and live dataset of the presence and interaction of the conference attendees. We used active OpenBeacon RFID tags which can used both for calculating positions as well as spotting other tags close by and allowing to accumulate a "interaction-graph".

After presenting at a engaging introductory GraphDB-Frankfurt meetup, I attended the very interesting GraphDB-Berlin meetup which covered the Twimpact event processing framework, that can be used as front-end to Neo4j.

Directly after the meetup we traveled to the JRubyConf.EU venue and started setting up the OpenBeacon tracking equipment (thanks for providing it) and attendee-tag-list as well as the first apps. That was quite adventurous and took us half of the night. The next morning we felt a bit sleep deprived.
Early morning the folks from OpenBeacon arrived and helped us fine tune the setup and configure a standalone (Atom) Linux-server to collect the information and host data-stream and apps. We connected it to a big screen and it was a place of curiosity ever after.




The attendees registering for the conference were handed their tags and batteries, surprisingly all were very keen to participate we got no resistance to the tracking.

I presented a quick intro to Neo4j and our Track & Hack project before the first keynote and had the opportunity for another short talk which outlined how well Neo4j integrates with (J)Ruby.
As everything was running I pushed the local RFID-Graph to Heroku for everyone to look at and play around with.






The Graph visualization is still online together with a realtime position-tracking visualization coded by Norbert (which used the live pusher-stream from the local server).
At the conference and also during the following days there was lot of interest in Neo4j as well as the tracking and the collected dataset. We had a number of really cool graph related discussions and lots of fun.


I want to thank the organizers Florian Gilcher and Alex Coles of JRubyConf.EU, Norbert Crombach who tirelessly worked with me on setting up and running the participant-tracking, Stefan Plantikow from MoviePilot for co-sponsoring, -presenting and helping out, Milosch Meriac and Jeff from OpenBeacon for being on-site and helping us setting up the tracking equipment and providing lots of tips and insights.

Wednesday, August 15, 2012

Neo4j HA over VPN



Neo4j's High Availability enables a torrent of data flowing to clients by replicating the graph across a synchronized cluster of servers. The logical cluster can operate within a single data center, or be spread out across the globe. No problem.
Well, though you may be concerned about data crossing the public internets. One solution is to use a VPN. In this great how-to blog post, John Russell walks you through his experience trying that out. 
Read more at his blog, and let us know if you give it a try. 

Monday, August 13, 2012

Neo4j 1.8.M07 - Sharing is Caring

Available immediately, Neo4j 1.8 Milestone 7 sets the stage for responsible data sharing. We’re open source. Naturally we’re mindful about supporting...

Open (Meta) Data 

 Way back when Neo4j 1.2.M01 was released we introduced the Usage Data Collector (UDC), an optional component which would help us understand how running instance of Neo4j were being used, by reporting back anonymous context information: operating system, runtime, region of the Earth, that kind of thing.

Of course, the source code for the UDC is open and available for inspection. Now, we’re taking some steps to make the meta-data itself available, to make that data useful for everyone in the community, and to do so while being uber sensitive to the slightest hint of privacy concerns.

We’re kinda excited about this, actually. Stay tuned to learn more about what we’re doing, how you can be involved, and how it will be awesome for the community.

Create Unique Data 

Earlier in the 1.8 branch, we introduced the RELATE clause, a powerful blend of MATCH and CREATE. With it, you could insist that a pattern of data should exist in the graph, and RELATE would perform the least creations required to uniquely satisfy the pattern.

In discussion, we kept saying things like “uniquely creates” to describe it, finally realizing that we should name the thing with the much more obvious CREATE UNIQUE.

To upgrade to CREATE UNIQUE, just s/RELATE/CREATE UNIQUE/ like so:
START left=node(1), right=node(3,4)
RELATE left-[r:KNOWS]->right
RETURN r
… becomes...
START left=node(1), right=node(3,4)
CREATE UNIQUE left-[r:KNOWS]->right
RETURN r

Read more details about the clause over in the manual page on it.

Notable Changes 

Kernel:
  •  Traversal framework backwards compatibility  
    • Cleaned up any breaking changes  
    • Removed Expander#addFilter 
  • Kernel JMX bean instance identifier is now reused and can optionally be set explicitly via forced_kernel_id config setting 
 Server:
  • Consoles in webadmin can now be disabled. 
Cypher:
  • Added escape characters to string literals 
  • Renamed `RELATE` to `CREATE UNIQUE
 UDC:
  •  Added edition information (community, advanced, enterprise) 
  • Added a cluster-name hash so that stores originating from the same cluster can be aggregated 
  • Fixed release version and revision OS version, architecture, and release 
  • Changed precedence of database configuration over internal udc configuration 
  • Added distribution information (dpkg, rpm, unknown)

Get Neo4j 1.8.M07

Neo4j 1.8.M07 is available for:
Note: this milestone is not yet available on Heroku

Cheers,
the Neo4j Team

Wednesday, August 8, 2012

Unit Testing with Neo4j using NoSQLUnit

To test a fully assembled system, a database must be properly integrated into the set-up and tear-down of the test environment. With Neo4j, JVM-friendly developers have a distinct advantage because the database can be instantiated directly in code. Though this is incredibly convenient, it still takes a bit of work to get it set up properly.

Now, NoSQLUnit makes it even easier, providing great support for easily bootstrapping a Neo4j server into your unit testing. Read their full blog post for all the details about how to get started.


Monday, August 6, 2012

New Neo4j Graphdb Meetups

This week, Neo4j announced three new graph database meetups, in the Netherlands, Belgium and France.

With these new additions, Neo4j's Meetup graph has expanded to 27 groups worldwide, over 2500 members in 24 different cities.

For those of you who are just getting started with Neo4j, or graph databases in general, your local meetup is a great way to instantly join the graph community, while receiving virtual support from the Neo team.

In anticipation of the inaugural GraphConnect conference in San Francisco, GraphConnect will give away free tickets to any participating graphdb meetup, up until the end of October.

If you would like to start your own Graphdb meetup, feel free to contact the Community Team, and we will get you squared away.


Friday, August 3, 2012

Announcing GraphConnect ticket winner from SF Graphdb Meetup


Last week we had about 50 graphistas show up for our Graphs in the Bay Area meetup, sponsored by GraphConnect and QCon SF. QCon and GraphConnect provided a small keg of local beer, pizza and a pair of free tickets to both conferences.

We had three locals present their projects to the group: Mathieu Bastian, Data Scientist at LinkedIn and co-founder of Gephi, presented common graph database use cases and graph visualizations. Alexander Smirnoff, from Kaiser Permanente dove into his side project, the Neo4j JCA connector, and his start-up Netoprise that combines your social graphs into one simple hub. This talk was followed up by Corey Farwell, an undergrad at Cal Poly San Louis Obispo, who demonstrated is RIAARadar app, built on the Neo4j graph database, that easily points you to which musical artists are affiliated with the RIAA.

Today we are announcing the lucky winner of the GraphConnect and QCon SF conference passes. Drum roll......
Congrats Jason McVetta!

Jason is a consultant at Harvard Medical School, and specializes in keeping up to date with different databases and tools.

Stay tuned for the next SF Graphdb Meetup, GraphConnect will be sponsoring it again, and it will be pretty sweet.



Thursday, August 2, 2012

Spring Data Neo4j Webinar follow up


Thanks again everyone for attending the Intro to Spring Data Neo4j webinar. We hope you enjoyed the presentation and learned a lot. We answered all of your questions below. Feel free to use the listed resources to learn more or to discuss your open questions with us.

Spring Data Neo4j is a reflection of a graph's nature: they are able to work and play well with other systems, while making it easy to make sense of connected data. It also demonstrates that indeed, graphs are everywhere. For more examples of this, be sure to come to GraphConnect in San Francisco, November 5-6 at the Hyatt Regency. There will be talks by hot start-ups, community contributors, and established enterprises telling their own graph story.

Your Questions and our Answers

Q: Do I always have to start the traversal at the root node?
A: It is possible to start a traversal from any node, or set of nodes. Those nodes can be looked up via and index or their id.

Q: What kind of tools are available to explore graph databases/like Toad, Navicat, etc.?
A: You can use the neo4j shell to explore the graph with cypher and other commands, the shell is also available in the Neo4j Server Web interface. The Web interface also offers simple visualization. Other tools are Neoclipse. But it is pretty simple to write a custom visualization, e.g. with JavaScript.

Q: With hibernate, the created SQL can be helpful when trying to do some performance analysis of my queries. Is there something comparable for Neo4j? Or another way to optimize queries?
A: You can set debug to level INFO and then the generated cypher queries are logged. SDN has custom queries where you can specify exactly how the query looks. We support Cypher and Gremlin. http://static.springsource.org/spring-data/data-graph/snapshot-site/reference/html/#d0e1736

Q: In what format does Neo4j store data in file system?
A: Custom storage, optimized separate stores for nodes, relationships and different property types.

Q: In neo4j-server mode does it support fail-over / load-balancing in case i want multiple db nodes deployed?
A: Yes, the Neo4j Enterprise it can run in a cluster with High Availability

Q: In that case, do nodes have to share the storage area or data gets replicated across nodes?
A: It is configured as master/slave replication, each node running on its own machine and filesystem, using a custom protocol for syncing.

Q: Can you comment on the progress of the Spring-Roo add-on?
A: Right now there was no time to work on the roo add-on. We'll look into that after the 2.1 release which is due in about two weeks.

Q: Is is best to index all searchable fields?
A: As often, it depends on the usage pattern and the queries you want to run. There is a write-time price for indexing. Usually you only index the fields you need to look up start nodes for traversals.

Q: How do I make sure the attribute values (e.g. employee id) to a node are not duplicated in my graph DB?
A: You use a unique constraint on an index: http://static.springsource.org/spring-data/data-graph/snapshot-site/reference/html/#d0e2100

Q: How do I do a one time data setup that is generally required for my enterprise applications.
A: There is a batch inserter if you do not want to use SDN directly: http://docs.neo4j.org/chunked/stable/batchinsert.html

Q: Did you run any performance test? How does it compare against conventional RDBMs in terms of performance?
A: It always depends on the use-case and data model, that's why generic or synthetic benchmarks are difficult. Graph databases are very fast for highly connected queries (lots of joins). For global queries the graph database doesn't perform that well. Usually local graph queries are executed in constant time regardless of the size of the graph. There is an benchmark example in the first (free) chapter of Neo4j in Action by Manning.

Q: Missed out, what are Repositories meant for?
A: Repositories are facades for data access. DAO is a similar pattern. SDN removes a lot of the boiler plate stuff you normally have to write.

Q: How is the support for High Availability (clustering/load balancing/fail over) etc?
A: We have an HA/ master-slave-replication solution: http://docs.neo4j.org/chunked/stable/ha-how.html

Q: What is the level of spring support for Neo4j, in sense like transaction etc?
A: Just add @Transactional to your service methods like you normally would. We also support DI, Exception Translation, Spring Converters, JavaConfig …

Q: Do I need the Spring framework to use this? Can't I use this as a stand alone library?
A: Right now yes, there are plans to make it work for instance in a JEE environment via CDI.

Q: Does it work as a persistence (JPA based) to have a temporary memory data and pushes to the storage?
A: Spring Data Neo4j is similar to JPA but relies on Neo4j's caches and in memory structures. Spring Data Neo4j reads and writes your objects to the graph or provides a live view (advanced mapping).