Sunday, April 5, 2009

The current database debate and graph databases

During the latest month or so there has been a lot of new energy in the database debate. People are questioning the RDBMS to an extent that we're not used to. Tony Bain asks: Is the Relational Database Doomed?. He talks about the simplicity, robustness, flexibility, performance, scalability, and compatibility of databases, where the RDBMS has been "good enough" in all areas in most cases. He goes on to say:

Today, we are in a slightly different situation. For an increasing number of applications, one of these benefits is becoming more and more critical; and while still considered a niche, it is rapidly becoming mainstream, so much so that for an increasing number of database users this requirement is beginning to eclipse others in importance. That benefit is scalability.

He goes on to describe alternatives to the traditional RDBMS, foremost he gives an overview over key/value databases.

Robin Bloor commented on the article and among other things comes up with three flaws in the relational data model:

  1. it has problems representing common data structures like ordered lists, hierarchies, trees or web page content
  2. it isn't helpful when the data model evolves over time
  3. there's a lack in access semantics, being restricted to the semantics provided by storing items as rows in tables

What about the alternatives to RDBMS? Most of the buzz is around key/value stores and schema-less databases. Richard Jones wrote the article Anti-RDBMS: A list of distributed key-value stores, listing the most important projects. Going schema-less can look very different, and the solution FriendFeed uses may not be the most common way to implement this.

What's then the most important difference between key/value stores and a graph database? Key/value stores lack built-in support for representing relationships between entities, and to capture such relationships is fundamental for data modeling. As Neo4j supports arbitrary key/value pairs on nodes and relationships it could as well be viewed as a key/value store with full built-in support for relationships.

Because of the advantages of using a graph database and the lack of products available in the market today, companies tend to build their own graph database as underpinning to highly scalable applications. Recently Scott Wheeler of Directed Edge wrote about their in-house database solution: On Building a Stupidly Fast Graph Database. In the associated Hacker News discussion he also mentioned that they tried Neo4j but were disappointed with the performance. This puzzles us since Neo4j is used in production with way larger amounts of data than what's mentioned in the discussion.

In summary, there's an increasing awareness that the 30+ year old relational model may not be optimal for a lot use cases that we as developers encounter today. It's great to see that the community is seriously starting to tackle the challenges of efficiently handling information in the 21st century!