Mass / Batch Importing

Importing many nodes or relationships at once is a common use case. Often the naive approach can be slow because each query is done over a separate HTTP request. There are a number of ways to improve this:

  • The neo4j-core gem (starting with version 7.0) supports batch execution of queries by calling the queries method an a CypherSession (There is not yet a means of doing this in ActiveNode and ActiveRecord in the neo4j gem)
  • Since even batched queries require sending a large payload of queries, you might consider making a single Cypher query with an array parameter which can be turned into a series of rows with the UNWIND clause which can then be used to execute a CREATE clause to make one creation per row from the UNWIND
  • The neo4apis gem offers a way to create a DSL for defining and loading data and will batch creations for you (see the neo4apis-github and neo4apis-twitter gems for examples of implementing a neo4apis DSL)

Outside of Ruby, there are also standard ways of importing large sets of data:

  • The LOAD CSV clause allows you to take a CSV in any format and create your own custom Cypher logic to import the data
  • The Neo4j import tool requires a specific CSV format for nodes and relationships, but it can be extremely fast. (Note that the import tool can only be used to create a new database, not to add to an existing one)

Cleaning Your Database for Testing

Often when writing tests for Neo4j it is desirable to start with a fresh database for each test. In general this can be as easy as writing a Cypher query which runs before each test:

// For version of Neo4j before 2.3.0
// Nodes cannot be deleted without first deleting their relationships

// For version of Neo4j before 2.3.0
// DETACH DELETE takes care of removing relationships for you

In Ruby:

# Just using the `neo4j-core` gem:
neo4j_session.query('MATCH (n) DETACH DELETE n')

# When using the `neo4j` gem:
Neo4j::ActiveBase.current_session.query('MATCH (n) DETACH DELETE n')

If you are using ActiveNode and/or ActiveRel from the neo4j gem you will no doubt have SchemaMigration nodes in the database. If you delete these nodes the gem will complain that your migrations haven’t been run. To get around this you could modify the query to exclude those nodes:

Separately, the database_cleaner gem is a popular and useful tool for abstracting away the cleaning of databases in tests. There is support for Neo4j in the database_cleaner gem, but there are a couple of problems with it:

  • Neo4j does not currently support truncation (wiping of the entire database designed to be faster than a DELETE)
  • Neo4j supports transactions, but nested transactions do not work the same as in relational databases. A failure in a nested transaction will cause the entire set of outer transactions to be rolled back. Therefore running tests inside of a transaction and rolling back a nested transaction for each test isn’t viable.

Because of this, all strategies in the database_cleaner gem amount to it’s “Deletion” strategy. Therefore, while you are welcome to use the database_cleaner gem, is is generally simpler to execute one of the above Cypher queries.