Query Examples¶
In the rest of the documentation for this site we try to lay out all of the pieces of the Neo4j.rb gems to explain them one at a time. Sometimes, though, it can be instructive to see examples. The following are examples of code where somebody had a question and the resulting code after fixes / refactoring. This section will expand over time as new examples are found.
Example 1: Find all contacts for a user two hops away, but don’t include contacts which are only one hop away¶
user.contacts(:contact, :knows, rel_length: 2).where_not(
uuid: user.contacts.pluck(:uuid)
)
This works, though it makes two queries. The first to get the uuid
s for the where_not
and the second for the full query. For the first query, user.contacts.pluck(:id)
could be also used instead, though associations already have a pre-defined method to get IDs, so this could instead be user.contact_ids
.
This doesn’t take care of the problem of having two queries, though. If we keep the rel_length: 2
, however, we won’t be able to reference the nodes which are one hop away in order. This seems like it would be a straightforward solution:
user.contacts(:contact1).contacts(:contact2).where_not('contact1 = contact2')
And it is straightforward, but it won’t work. Because Cypher matches one subgraph at a time (in this case roughly (:User)--(contact1:User)--(contact2:User)
), contact
one is always just going to be the node which is in between the user in question and contact2
. It doesn’t represent “all users which are one step away”. So if we want to do this as one query, we do need to first get all of the first-level nodes together so that we can then check if the second level nodes are in that list. This can be done as:
user.as(:user).contacts
.query_as(:contact).with(:user, first_level_ids: 'collect(ID(contact))')
.proxy_as(User, :user)
.contacts(:other_contact, nil, rel_length: 2)
.where_not('ID(other_contact) IN first_level_ids')
And there we have a query which is much more verbose than the original code, but accomplishes the goal in a single query. Having two queries isn’t neccessarily bad, so the code’s complexity should be weighed against how both versions perform on real datasets.
Example 2: Simple Recommendation Engine¶
If you are interested in more complex collaborative filter methods check out this article.
Let’s assume you have the following schema:
(:User)-[:FOLLOW|:SKIP]->(:Page)
We want to recommend pages for a user to follow based on their current followed pages.
Constraints:
We want to include the source of the recommendation. i.e (we recommend you follow X because you follow Y).
Note : To do this part, we are going to use an APOC function
apoc.coll.sortMaps
.We want to exclude pages the user has skipped or already follows.
The recommended pages must have a name field.
Given our schema, we could write the following Cypher to accomplish this:
MATCH (user:User { id: "1" })
MATCH (user)-[:FOLLOW]->(followed_page:Page)<-[:FOLLOW]-(co_user:User)
MATCH (co_user)-[:FOLLOW]->(rec_page:Page)
WHERE exists(rec_page.name)
AND NOT (user)-[:FOLLOW|:SKIP]->(rec_page)
WITH rec_page, count(rec_page) AS score, collect(followed_page.name) AS source_names
ORDER BY score DESC LIMIT {limit}
UNWIND source_names AS source_name
WITH rec_page, score, source_name, count(source_name) AS contrib
WITH rec_page, score, apoc.coll.sortMaps(collect({name:source_name, contrib:contrib*-1}), 'contrib') AS sources
RETURN rec_page.name AS name, score, extract(source IN sources[0..3] | source.name) AS top_sources,
size(sources) AS sources_count
ORDER BY score DESC
Now let’s see how we could write this using ActiveNode syntax in a User
Ruby class.
class User
include Neo4j::ActiveNode
property :id, type: Integer
has_many :out, :followed_pages, type: :FOLLOW, model_class: :Page
has_many :out, :skipped_pages, type: :SKIP, model_class: :Page
def recommended_pages
as(:user)
.followed_pages(:followed_page)
.where("exists(followed_page.name)")
.followers(:co_user)
.followed_pages
.query_as(:rec_page) # Transition into Core Query
.where("exists(rec_page.name)")
.where_not("(user)-[:FOLLOW|:SKIP]->(rec_page)")
.with("rec_page, count(rec_page) AS score, collect(followed_page.name) AS source_names")
.order_by('score DESC').limit(25)
.unwind(source_name: :source_names) # This generates "UNWIND source_names AS source_name"
.with("rec_page, score, source_name, count(source_name) AS contrib")
.with("rec_page, score, apoc.coll.sortMaps(collect({name:source_name,contrib:contrib*-1}), 'contrib') AS sources")
.with("rec_page.name AS name, score, extract(source in sources[0..3] | source.name) AS top_sources, size(sources) AS sources_count")
.order_by('score DESC')
.pluck(:name, :score, :top_sources, :sources_count)
end
end
Note : The contrib*-1 value is a way of getting the desired order out of the sortMaps APOC function without needing to reverse the resulting list.
This assumes we have a Page
class like the following:
class Page
include Neo4j::ActiveNode
property name, type: String
has_many :in, :followers, type: :FOLLOW, model_class: :User
has_many :in, :skippers, type: :SKIP, model_class: :User
end