Neo4j & Graph Databases: Index-Free Adjacency and Cypher

Try writing "find people connected to this person through up to six degrees of friendship" in SQL and you'll feel the wall immediately: six self-joins, exploding intermediate results, and a query that crawls more slowly the more data you have. The relational model is superb at sets and aggregates, but relationships β€” the connections between things β€” are exactly what it represents most awkwardly, as foreign keys you reconstruct with a join every single time you traverse them. Graph databases exist because for some problems the relationships are the point, and they deserve to be first-class.

Neo4j is the best-known of them, and the reason it can answer that six-degrees question quickly comes down to one internal design choice: index-free adjacency. I'll cover the property-graph model, why that storage trick makes traversal scale completely differently from joins, the Cypher query language, and β€” importantly β€” when a graph is the right tool and when it absolutely isn't.

The property graph model

A graph database stores data as a property graph, made of just two kinds of things plus annotations:

  • Nodes β€” the entities (a Person, a Product, a Bank Account). Nodes carry labels (their type) and properties (key-value attributes like name, age).
  • Relationships β€” the connections (FRIEND_OF, PURCHASED, TRANSFERRED_TO). Crucially, relationships are first-class: they have a type, a direction, and their own properties (a TRANSFERRED_TO can carry amount and date).

That last point is the conceptual shift. In a relational schema, a relationship is implicit β€” a foreign key, or a join table you assemble at query time. In a graph, a relationship is a stored object you can attach data to and traverse directly. The data model matches how we actually draw these problems on a whiteboard: circles and arrows, where the arrows matter as much as the circles.

graph LR
    A["(:Person name: Alice)"]
    B["(:Person name: Bob)"]
    C["(:Person name: Carol)"]
    P["(:Product name: Camera)"]
    A -->|"FRIEND_OF since: 2018"| B
    B -->|"FRIEND_OF since: 2019"| C
    A -->|"PURCHASED qty: 1"| P
    C -->|"PURCHASED qty: 2"| P
          

A property graph. Nodes (people, a product) carry labels and properties; relationships are typed, directed, and carry their own properties. "Which products did my friends buy?" is a short walk along FRIEND_OF then PURCHASED edges β€” a traversal, not a multi-table join reconstructed from foreign keys.

Index-free adjacency: why traversal scales

Here's the internal that makes graph databases worth their existence. In a relational database, traversing a relationship means a join: to follow "Alice's friends," the engine looks up matching rows, typically via an index β€” an operation whose cost grows with the size of the tables (a B-tree lookup is O(log n), and a multi-hop query multiplies these). The more data you have, the slower each hop gets.

Neo4j uses index-free adjacency: each node stores direct physical pointers to its relationships and neighboring nodes. Following an edge is a pointer dereference β€” it costs the same whether the graph has a thousand nodes or a billion, because you never search; you just walk. Traversal is therefore O(1) per hop, dependent only on how many edges you actually follow, not on the total graph size. That's the whole game: a six-hop query touches only the local neighborhood it walks through, while the equivalent six-way SQL join scans and matches against the full tables at every step.

The clean way to hold the difference: a relational join searches for matches; a graph traversal follows pointers it already has. This is why a query that's merely slow in SQL at small scale becomes impossible at large scale β€” the join cost compounds with data growth β€” while the same query on a graph stays fast because its cost is tied to the answer's size, not the database's size. When you hear "deep relationship queries," that's the asymmetry being described.

Cypher: querying by pattern

You don't traverse a graph with loops; you describe the pattern you're looking for and let the engine find it. Neo4j's query language, Cypher, is built around ASCII-art that literally looks like the graph: nodes in parentheses, relationships in brackets with arrows. It reads remarkably close to the question:

// "Which products did Alice's friends buy?"
MATCH (alice:Person {name: 'Alice'})-[:FRIEND_OF]->(friend)-[:PURCHASED]->(p:Product)
RETURN friend.name, p.name

// Variable-length traversal: people within 1–4 hops of Alice
MATCH (alice:Person {name: 'Alice'})-[:FRIEND_OF*1..4]->(reachable)
RETURN DISTINCT reachable.name

The *1..4 is the tell β€” variable-length path matching is a first-class operator, expressing "between one and four hops" in a few characters. Writing that in SQL means a UNION of self-joins or a recursive CTE that degrades badly. Cypher makes the queries graphs are good at also easy to write, which matters: a tool that's powerful but unreadable doesn't get used. (The industry is, around now, beginning to standardize a common graph query language, GQL, drawing heavily on Cypher.)

When a graph wins β€” and when it doesn't

Graph databases are a sharp tool, not a general-purpose one, and matching them to the right problem is the whole skill. They win decisively when the relationships and their traversal are the core of the question:

  • Fraud detection β€” finding rings: accounts connected through shared devices, addresses, or rapid transfer chains (a multi-hop pattern that's the definition of a graph query).
  • Recommendations β€” "people who bought what your connections bought," collaborative paths through a purchase graph.
  • Network and IT topology β€” impact analysis: "if this switch fails, what downstream depends on it?"
  • Knowledge graphs β€” entities and their relationships as a queryable web, increasingly the backbone of richer search and (looking ahead) retrieval for AI systems.

Don't make a graph database your system of record for everything. The failure mode is using a graph for workloads it's bad at: large aggregations ("total revenue by month across all orders") scan huge swaths of the graph and a columnar warehouse will crush it; high-volume transactional CRUD is better served by a relational store; and horizontal scaling of a densely connected graph across machines is genuinely hard (you can't cleanly partition something whose value is its connectedness). The right pattern is usually a graph alongside your other stores β€” pointed at the specific relationship-heavy questions β€” not in place of them. Use it for traversals; keep aggregates and bulk transactions where they belong.

One more positioning note: the property graph (Neo4j, Cypher) isn't the only model β€” RDF triplestores with SPARQL serve a more standards-and-semantics-driven world (linked data, ontologies). For most application and analytics use cases the property-graph model is the more pragmatic, more performant choice; reach for RDF when interoperable, formally-defined semantics are the actual requirement.

What to carry away

Graph databases make relationships first-class. The property-graph model (labeled nodes and typed, directed, property-bearing relationships) matches how connected problems are actually shaped, and index-free adjacency is the internal that makes them worth it: relationships are stored as direct pointers, so traversal costs O(1) per hop regardless of total size β€” exactly where relational joins, whose cost compounds with data growth, fall apart. Cypher makes those traversals readable, with variable-length paths as a first-class operator.

Reach for a graph when the question is the relationships β€” fraud rings, recommendations, topology, knowledge graphs β€” and keep bulk aggregations and high-volume transactions in the stores built for them. Used that way, a graph answers questions that are slow-to-impossible elsewhere. It's also the substrate beneath knowledge-graph-driven AI: the same traversals power GraphRAG and the graph-backed retrieval that vector search alone can't do.