The first GraphGist Challenge completed


Update: This post is from 2013, the GraphGist infrastructure and links have changed several times since, then. If you are looking for a particular one, please head to the GraphGist portal and search there for its title.

We’re happy to announce the results of the first GraphGist challenge.
Anders Nawroth
First of all, we want to thank all participants for their great contributions. We were blown away by the high quality of the contributions. Everyone has put in a lot of time and effort, providing thoughtful, interesting and well explained data models and Cypher queries. There was also great use of graphics, including use of the Arrows tool. We thought we had high expectations, but the contributions still exceeded them by far. In this sense, everyone is a winner, and we look forward to sending out a cool Neo4j t-shirt and Graph Connect ticket or a copy of the Graph Databases book to all participants. And for the same reason, we strongly advice you to go have a look at all submissions. Here are all the contributions: As you can imagine, we had a hard time deciding which contributions should get the first, second and third price. Anyhow, here’s the result, in reverse order:

Third Prize

At third place, we find Chess Games and Positions by Wes Freeman. He makes it all sound very simple:
The goal is to load a bunch of chess games into Neo4j for further analysis. Scores listed are Stockfish’s take on a position after a 25 move horizon (but this number can be deepened as the graph is filled out or as more processing is done). Positions can also be loaded as alternative moves (not connected to a game) based on suggestions from Stockfish. The positions are recorded as FEN, a human-readable/compressed chess board state notation.
And the data model is not overly complex at all, here’s a bit of example data: We thought GraphGists have quite much interactivity, but Wes shows how to get even more interactivity into a GraphGist. After simply listing the moves of a game, he goes on to show off some cool statistics, which reveals the blunders in a game and even suggests better moves.

Second Prize

Learning Graph by Johannes Mockenhaupt comes in at second place. Here’s his own introduction to it:
This graph is used to visualize the knowledge a person has in a certain area. … The purpose is to document acquired knowledge and to help to further educate oneself in a structured way. This is accomplished by graphing dependencies between technologies as well as resources that can be used to learn a technology and to determine possible learning paths through the graph, which show a way to learn a specific technology, by first learning the technologies, in order, which are prerequisites for the technology to be learned. The graph is meant not to be static, but updated as new connections between technologies are discovered and new knowledge is acquired.
This is how the data model plays out with a tiny set of data: The data model is easy to grasp, and at the same time, it shows the power of graphs in a prominent way. The queries are surprisingly simple — if you ever tried to do something similar using an RDBMS, you’ll appreciate the straightforwardness and elegance of the queries presented! It’s also nice to see how the data gets updated along the way. Finally, the explanations of the queries and their results binds everything together to form a pleasant read.

First Prize

The US Flights & Airports contribution from Nicole White finished first in this challenge. Congrats Nicole! Here’s the background:
For any airline carrier, efficiency is key: delayed or cancelled flights and long taxi times often lead to unhappy customers. Flight planning is one the most complex optimization and scheduling problems out there, requiring a deep analysis of flight and airport data.
A simple proposed data model which allows complex questions to be answered. One of the strengths of a graph database. The interesting details were not in just modeling the flights but also the cancellations and delays.
Nicole stated interesting questions on top of the data model and dataset which she was going to answer using Cypher queries:
  • What is the average taxi time at each airport for both departures and arrivals?
  • What is the leading cause of departure delays at each airport?
  • How many outbound flights were cancelled at each airport?
Or more specific questions such as:
  • Which flights from Los Angeles (LAX) to Chicago (ORD) were delayed for more than 10 minutes due to late arrivals?
  • How does seasonality affect departure taxi times at Chicago’s O’Hare International Airport (ORD)?
  • What is the standard deviation of arrival taxi times at Dallas/Fort Worth (DFW)?
To show just one example: Which flights from Los Angeles (LAX) to Chicago (ORD) were delayed for more than 10 minutes due to late arrivals?
MATCH (a)<- data-blogger-escaped-f="">(b), (f)-[r:DELAYED_BY]->(d) WHERE a.name="Los Angeles International Airport" AND b.name="O'Hare International Airport"       AND r.time > 10 AND d.name="Late Aircraft" WITH f, r.time AS latedelay RETURN f.flight_number AS Flight, latedelay AS `Delay Time Due to Late Arrival`
This query results in:
Flight Delay Time Due to Late Arrival
1062 16
1894 15
With her scientific approach, listing included variables and using MathJax to render the used mathematical formulas, this submission is really impressive and a worthy winner. Our congratulations go to every participant and the winners. We are really thrilled about the results of this competition.

GraphGists evolving & The next GraphGist Challenge

During the challenge we improved the code behind GraphGists:
  • We added support for Math formulas.
  • We added Disqus integration, so there are now comments connected to each GraphGist. Please add your comments to the challenge contributions, the authors will be happy for feedback and suggestions.
  • We removed the annoying headings above result tables and graphs.
  • We fixed some issues and added a workaround so Chrome under Windows doesn’t crash.
  • We improved the styling a bit. (It’s still very primitive though.)
Thanks for everyones feedback: it helped us iron out some of the shortcomings. If you want to have a look at the GraphGist project, it’s located here: https://github.com/neo4j-contrib/graphgist. It’s a client-side only browser-based application. Meaning, it’s basically a bunch of Javascript files. We’d be happy to see Pull Requests for the project. Please note that you can contribute styling or documentation (as a GraphGist), not only Javascript code! We already got questions about the next GraphGist challenge. Our plan is to run the next challenge around the time Neo4j 2.0 gets released. Currently we think that will mean a closing date before Christmas. We’ll keep you posted when we know more. Greetings from the Neo4j GraphGist Challenge gang! Anders Nawroth, Peter Neubauer, Michael Hunger, Pernilla Lindh, Mark Needham, Kenny Bastani Want to learn more about graph databases? Click below to get your free copy of O’Reilly’s Graph Databases ebook and discover how to use graph technologies for your application today. Download My Ebook