Tuesday, February 14, 2012

Webinar Follow Up: How to Get Started with Neo4j

Hey everyone,

We held our How to Get Started with Neo4j webinar last week, and received lots of great questions from our participants.


Here are the questions captured in the Q&A section. If you don't see your question here, please be sure to join our Neo4j User Group, where our community will be sure to help you out.


What are your experiences in the medicare/medicaid business world, and/or real-world cases that handle thousands of simultaneous requests? All of our commercial customers using Neo4j in production can be found here. Neo4j is used within the social media space, geo-spatial arena, telcos, and many other sectors.

As for our open-source community, there are so many projects going on, it is better to ask the community yourself. Go to our User Group and ask if anyone is using Neo4j for medical records.


How is the performance and scalability of Neo4j compared with something like MongoDB?
This all depends on what type of data you have. If you want to be able to throw your data somewhere quickly, Mongo is a great tool for that. If you have complex data with lots of connections, and want to be able to quickly retrieve data between different data points, Neo4j is a better fit. The great thing about NOSQL databases is that it is not "one size fits all" model. In fact, you can use more than one database for your set of data. We happen to be seeing that data is becoming more connected by nature, and the benefits of using a graph database are growing rapidly


How do you compare Neo4j with Cassandra or Hadoop?
Again, this all depends on what type of data you have. Cassandra is in the column-family category of NOSQL databases, all of which great scalability on a very simple data model. Neo4j is on the other end of the curve, with a rich data model but less scalability.

Since Hadoop is a framework for conducting analytics on large data sets, it is more comparable to projects like Golden Orb if you're interested in Pregel-style graph analytics versus map-reduce.


How do you decide between modeling tags as nodes or relations as more and more actions can be performed on tags themselves?
Like considering queries when designing your RDBMS schema, it's helpful to consider the graph traversals (queries) you want to run when laying out the structure of your graph. Use a whiteboard, sketch out example data and see how natural it is to answer questions by following paths in the graph.

On the topic of ACLs, with the rise of OAuth 2 have you seen OAuth 2 token ACLs modeled using a graph DB?

OAuth2 ACLs do make perfect sense to capture in a graph, but haven't yet come up in our discussion group. Let us know if you embark on such a project, I'm sure a lot of people would find it interesting.


How do you resolve duplicates?
Nodes and Relationships can be created with unique properties. See the documentation for details. For existing data, your application would have to scan through all nodes to check.

The documentation for Cypher seems to be a bit sparse. Are there efforts to more fully document the Cypher query language formally?
While very mature, stable, and capable, we have not released a fixed language specification of Cypher because it continues to evolve. For now, the Neo4j Manual contains all the latest (fully tested) information about Cypher syntax while we work towards a feature-complete language.

Is there a project underway to run Neo4j natively in .NET?
Unfortunately, no. There are client drivers available, but the Neo4j kernel is targeted very specifically at Java 1.6.

What kind of automated operational monitoring support is there in Neo4j (e.g. JMX)?
Neo4j does expose monitoring through JMX, which is also available through REST endpoints that are visible in Webadmin.

How to migrate existing data from relational DB to neo4j?
For initial data import, Neo4j offers a batch insertion mode that relaxes transactional requirements to enable higher throughput. Common practice is to migrate incrementally, identifying the tables involved in complex joins and mapping the schema to a graph layout.

Is there a strategy for some complex relational databases to be migrated to the Neo4j model, for example Sybase to Neo4j?
There is no automated tool for migration from a relational database, though we have done lab work on synchronizing relational tables with a companion graph. Migration is typically achieved with a custom importer written in java (or any jvm language) which uses the batch insertion mode. See the Neo4j Manual for some guidance.

Is there something more on data modelling in Neo4j, and how to structure data?
We have given workshops about best practices, and will consider scheduling a webinar and writing some blog posts to discuss best practices for structuring data in a graph.

Is there a way to apply continuous location-dependent queries (over moving objects) on graph-based spatial models in neo4j?
This sounds like an RDBMS view, which doesn't have an equivalent in Neo4j. Traversal queries execute quickly, but are lazy-loaded without read-locking. So the trick here would be balancing write updates for the moving objects with the timing of the spatial reads.



We have some great meetups and events coming up, and don't forget to sign up for our next webinar on Spring Data Neo4j in the Cloud, taking place February 16th.


-ayeeson

No comments: