Skip to content
Published on

Graph Databases & Knowledge Graphs 2026 Deep Dive — Neo4j 5 · ArangoDB · Memgraph · TigerGraph · Amazon Neptune · Apache AGE · Kuzu · FalkorDB · Dgraph · GraphRAG

Authors

"GraphRAG enables global summarization that simple vector RAG cannot answer. We have entered an era where a single question can be answered against a million documents synthesized into one structure." — Microsoft Research, GraphRAG paper (April 2024)

Graph databases entered their second golden age between 2024 and 2026. The first one was the fraud-detection and recommendation-system boom of the late 2010s; the second is the era of LLMs, GraphRAG, and agent memory. When Microsoft released GraphRAG in April 2024 and OpenAI adopted knowledge-graph patterns for memory, GitHub stars for Neo4j, ArangoDB, and FalkorDB exploded — and in the same month, ISO adopted GQL (Graph Query Language) as the first new ISO database query standard since SQL.

As of May 2026, the graph DB world is split into six camps: property-graph engines (Neo4j / Memgraph / TigerGraph), RDF triple stores (GraphDB / Stardog / Virtuoso), multi-model engines (ArangoDB / SurrealDB), cloud managed services (Amazon Neptune / Cosmos DB Gremlin), embedded engines (Kuzu / FalkorDB), and Postgres extensions (Apache AGE). This piece walks through each camp, the four major query languages, the GraphRAG pattern, and real production cases.

1. The 2026 Graph DB Map — Six Camps

The 2026 graph DB ecosystem is multipolar with no single winner.

CampRepresentative ToolsCore Trait
Property Graph (LPG)Neo4j 5, Memgraph, TigerGraphNodes / relationships / properties, Cypher/GQL
RDF Triple StoreGraphDB, Stardog, Virtuoso, Apache JenaSubject-predicate-object, SPARQL
Multi-modelArangoDB, SurrealDB, OrientDBDocument + graph + KV
Cloud ManagedAmazon Neptune, Cosmos DB Gremlin, AuraDBManaged SaaS
EmbeddedKuzu, FalkorDB, NetworkXLocal / analytics workloads
Postgres ExtensionApache AGE, pgRoutingGraph on top of Postgres

The core insight: graph DBs are no longer a niche; they are RAG / agent / MDM infrastructure. Microsoft's April 2024 GraphRAG paper, Neo4j's LLM Knowledge Graph Builder, and FalkorDB's GraphRAG package all converged, and AWS launched Neptune Analytics in the same period. Graphs now sit beside vector DBs as a standard component of the RAG stack.

2. The Rise of GraphRAG — The April 2024 Inflection Point

The paper "From Local to Global: A Graph RAG Approach to Query-Focused Summarization", released by Microsoft Research in April 2024, was an inflection point for the graph DB market. Traditional vector RAG ("retrieve the top-5 similar chunks from a million documents and stuff them into the LLM") is weak on global questions like "what are the overall themes here?". GraphRAG instead uses the LLM to extract entities and relationships into a knowledge graph and then runs community detection over that graph to build hierarchical summaries.

# Microsoft GraphRAG basic flow (conceptual code)
from graphrag.index import build_index
from graphrag.query import GlobalSearch, LocalSearch

# 1. Indexing — decompose documents into entities/relationships via LLM
index = build_index(
    documents="./data/*.txt",
    llm_model="gpt-4o",
    embedding_model="text-embedding-3-small",
    community_levels=3,  # 3 levels of Leiden community detection
)

# 2. Global search — answer over community summaries
global_search = GlobalSearch(index)
answer = global_search.search("What are the central themes in this corpus?")

# 3. Local search — explore around a specific entity
local_search = LocalSearch(index)
answer = local_search.search("How are Sam Altman and the OpenAI board related?")

The implication of GraphRAG was a redefinition: "graph DBs are the memory infrastructure of the LLM era." Neo4j, FalkorDB, Kuzu, and Memgraph quickly shipped LLM-integrated tooling to match the moment, and by 2025 LangChain and LlamaIndex both added Property Graph indexes as a first-class module.

3. Neo4j 5 — The De Facto Standard

Neo4j (Neo4j Inc., founded 2007, $325M Series F in 2021) is the de facto standard for graph DBs. As of May 2026 the Neo4j 5.x line is the stable series, and three properties dominate.

First, the Cypher query language — ASCII-art-style pattern matching such as MATCH (a)-[:KNOWS]->(b) RETURN a, b that makes graph patterns intuitive. Second, ACID transactions — a real OLTP graph engine with atomic multi-node, multi-relationship updates. Third, the Graph Data Science (GDS) library — 70+ algorithms including PageRank, community detection, shortest path, and node embeddings, runnable on a single node without GPUs.

// Neo4j 5 Cypher — movie recommendation graph pattern
MATCH (user:User {id: $userId})-[:RATED]->(movie:Movie)<-[:RATED]-(other:User),
      (other)-[:RATED]->(rec:Movie)
WHERE NOT (user)-[:RATED]->(rec)
WITH rec, COUNT(*) AS strength
RETURN rec.title, strength
ORDER BY strength DESC
LIMIT 10

Major Neo4j 5 features include Composite Database (combining multiple graphs in one query), Vector Index (added 2023.10, lets you store OpenAI-style embeddings on graph nodes), and Parallel Runtime (parallel read-query execution). Licensing splits into Community Edition (GPLv3), Enterprise (commercial), and AuraDB (SaaS).

4. Cypher and GQL — The Graph Query That Became an ISO Standard

Cypher was created by Neo4j in 2011 as a graph query language, opened up via the openCypher project in 2015, and standardized as ISO/IEC 39075 (GQL — Graph Query Language) in April 2024. It is the first new ISO database query language adopted since SQL (1987).

// 1. CREATE — make nodes and a relationship
CREATE (alice:Person {name: 'Alice', age: 30})
CREATE (bob:Person {name: 'Bob', age: 28})
CREATE (alice)-[:KNOWS {since: 2018}]->(bob)

// 2. MATCH — pattern matching
MATCH (p:Person)-[:KNOWS*1..3]->(friend)
WHERE p.name = 'Alice'
RETURN DISTINCT friend.name

// 3. MERGE — UPSERT
MERGE (city:City {name: 'Seoul'})
ON CREATE SET city.population = 9700000
ON MATCH SET city.last_seen = datetime()

GQL is the successor to Cypher and is managed alongside SQL under the ISO/IEC 9075 / 39075 tracks. Neo4j, SAP HANA, TigerGraph, and Memgraph are all working on GQL compliance, and most property graph databases are expected to support GQL 1.0 by the end of 2026.

5. Memgraph — In-Memory, Cypher-Compatible

Memgraph (Memgraph Inc., 2016–, Croatia) is an in-memory graph DB that is Cypher-compatible with Neo4j. Three differentiators stand out. First, in-memory first with disk persistence — all data lives in RAM, delivering 5–10× faster query response than Neo4j. Second, the MAGE library — graph algorithms and LLM tooling written in C++/Python/Rust. Third, Memgraph Lab — a visual IDE.

# Run Memgraph Platform via Docker
docker run -p 7687:7687 -p 7444:7444 -p 3000:3000 \
  --name memgraph memgraph/memgraph-platform

# Open the Cypher shell
docker exec -it memgraph mgconsole

# Call an in-memory graph algorithm (MAGE)
CALL pagerank.get() YIELD node, rank
RETURN node.name, rank ORDER BY rank DESC LIMIT 10;

Since 2024, Memgraph has added features dedicated to GraphRAG and agent memory; MAGE now ships LLM-driven entity extraction algorithms, node2vec, and modules such as text-to-cypher. Licensing splits into Community (BSL 1.1, Apache 2.0 after a delay window) and Enterprise.

6. TigerGraph — Distributed Graph + GSQL

TigerGraph (TigerGraph Inc., 2012–, USA) is a commercial graph DB specialized for distributed processing. Its differentiators are GSQL — a SQL-friendly graph query language — and an MPP (massively parallel processing) architecture. It guarantees fast multi-hop queries (>3 hops) at the scale of tens of billions of nodes and hundreds of billions of edges.

-- GSQL example: friend-of-friend-of-friend (3-hop) recommendation
CREATE QUERY recommend_friends(VERTEX<Person> seed) FOR GRAPH SocialNet {
  SumAccum @score;
  Start = {seed};

  Friends = SELECT v FROM Start:s -(KNOWS:e)- :v;
  FoFs    = SELECT v FROM Friends:s -(KNOWS:e)- :v
            WHERE v != seed
            ACCUM v.@score += 1;

  PRINT FoFs[FoFs.@score] LIMIT 10;
}

TigerGraph has been deployed for fraud detection at large financial institutions like SK Telecom, the People's Bank of China, and ICBC; in 2024 it shipped a GraphRAG module. Licensing is commercial but offers TigerGraph Cloud (SaaS) and the TigerGraph DB Free Tier (capped at 1B edges).

7. ArangoDB — The Multi-Model Standard-Bearer

ArangoDB (ArangoDB Inc., 2014–) is a multi-model database that handles documents, graphs, and key-value pairs in a single engine. Its core value is AQL (ArangoDB Query Language), which can mix documents, graphs, and KV access in a single query.

// AQL — graphs, documents, and filters in one query
FOR user IN users
  FILTER user.country == "KR"
  FOR friend IN 2..3 OUTBOUND user knows_edges
    FILTER friend.age >= 18
    RETURN { user: user.name, friend: friend.name }

ArangoDB added GraphML integration in 4.0 (2024) and Vector Search in 3.12 (2025); the ArangoSearch + Vector pairing is widely used for RAG workloads. Licensing splits into Community (Apache 2.0) and Enterprise. The managed SaaS is ArangoGraph (formerly Oasis).

8. Amazon Neptune — AWS's Managed Graph

Amazon Neptune (2018–, AWS) is a managed graph DB supporting property graph (Gremlin / openCypher) and RDF (SPARQL) in one engine. AWS added Neptune ML (GNN-driven recommendations) in 2023, Neptune Analytics (an in-memory analytics engine) in 2024, and started promoting openCypher to first-class status in 2025.

# Neptune Gremlin (Python boto3)
import boto3
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.process.anonymous_traversal import traversal

conn = DriverRemoteConnection(
    'wss://my-cluster.cluster-xxx.us-east-1.neptune.amazonaws.com:8182/gremlin',
    'g'
)
g = traversal().withRemote(conn)

# friend-of-friend recommendation
result = (g.V().has('person', 'name', 'Alice')
            .out('knows').out('knows')
            .dedup()
            .limit(10)
            .values('name')
            .toList())

Neptune's biggest selling point is AWS integration: IAM, VPC, CloudWatch, and SageMaker all wire in naturally; backup, HA, and encryption are automatic. Neptune Analytics, GA in 2025, can finish 50B-edge analytical workloads in minutes — a strong fit for building GraphRAG indexes.

9. Apache AGE — Graph on Top of Postgres

Apache AGE (A Graph Extension, donated by BitNine in 2020, Apache 2.0) is a PostgreSQL extension that overlays openCypher onto Postgres. It is the most natural choice when you want to add graph queries to an existing Postgres deployment.

-- Apache AGE — run openCypher inside Postgres
CREATE EXTENSION age;
LOAD 'age';
SET search_path = ag_catalog, "$user", public;

-- create a graph
SELECT create_graph('social');

-- run Cypher embedded in SQL
SELECT * FROM cypher('social', $$
  CREATE (a:Person {name: 'Alice'})-[:KNOWS]->(b:Person {name: 'Bob'})
  RETURN a, b
$$) AS (a agtype, b agtype);

-- pattern search
SELECT * FROM cypher('social', $$
  MATCH (p:Person)-[:KNOWS*1..3]->(f:Person)
  RETURN p.name, f.name
$$) AS (person_name agtype, friend_name agtype);

AGE's value proposition is "reuse the entire Postgres transactional, indexing, replication, and extension ecosystem." You can run it alongside pgvector, PostGIS, and TimescaleDB in the same database, which makes it slot naturally into RAG stacks. Its limitation is performance — deep multi-hop queries lag dedicated graph engines.

10. Kuzu — Embedded Analytical Graph DB

Kuzu (University of Waterloo / Kuzu Inc., 2022–, MIT/Apache 2.0) is an embedded graph database styled as the DuckDB of graphs. It installs as a single binary or Python wheel and is optimized for analytical (OLAP) workloads.

import kuzu
import pandas as pd

# Create an embedded DB
db = kuzu.Database("./mydb")
conn = kuzu.Connection(db)

# Define schema
conn.execute("CREATE NODE TABLE Person(id INT64, name STRING, PRIMARY KEY(id))")
conn.execute("CREATE REL TABLE Knows(FROM Person TO Person, since INT64)")

# Load from a Pandas DataFrame
people_df = pd.DataFrame({"id": [1, 2, 3], "name": ["A", "B", "C"]})
conn.execute("COPY Person FROM people_df")

# Cypher query
result = conn.execute("""
  MATCH (a:Person)-[:Knows*1..3]->(b:Person)
  WHERE a.name = 'A'
  RETURN DISTINCT b.name
""")
print(result.get_as_df())

Since 2024, Kuzu has integrated vector indexes and full-text search, with LlamaIndex and LangChain integrations following. Its embedded model lets a laptop analyze billion-edge graphs, making it very popular for GraphRAG prototypes.

11. FalkorDB — Sparse Matrix on Redis

FalkorDB (2023, fork of RedisGraph) is a graph DB built as a Redis module that uses sparse-matrix representations under the hood. After Redis Labs discontinued RedisGraph in 2023, the core contributors forked it forward as FalkorDB.

The underlying technique is linear-algebra-based graph computation — every graph query is rewritten as sparse matrix multiplication and executed against the GraphBLAS library. That is what makes multi-hop queries so fast.

# FalkorDB — Redis-backed Cypher
from falkordb import FalkorDB

db = FalkorDB(host='localhost', port=6379)
graph = db.select_graph('social')

# create nodes and a relationship
graph.query("""
  CREATE (a:Person {name: 'Alice'})-[:KNOWS]->(b:Person {name: 'Bob'})
""")

# pattern search
result = graph.query("""
  MATCH (p:Person)-[:KNOWS*1..3]->(f:Person)
  RETURN DISTINCT f.name
""")
for row in result.result_set:
    print(row[0])

FalkorDB's most striking differentiator is the GraphRAG SDK — feed in documents and it gives you the full pipeline: LLM-driven entity extraction, graph construction, and natural-language-to-Cypher translation. 2025 brought official integrations with LangChain and Microsoft AutoGen.

12. Dgraph — Distributed Native GraphQL

Dgraph (Dgraph Labs, rebranded to Hypermode, 2016–, Apache 2.0) is a distributed graph DB that adopts native GraphQL as a first-class query language. The parent company rebranded as Hypermode in 2024 to focus more on LLM and agent workloads.

# Dgraph — GraphQL query
query {
  queryUser(filter: {country: {eq: "KR"}}) {
    name
    friends {
      name
      age
    }
  }
}

# DQL (Dgraph Query Language)
{
  users(func: eq(country, "KR")) {
    name
    friends @filter(ge(age, 18)) {
      name
      age
    }
  }
}

Dgraph offers distributed processing on RAFT consensus, automatic sharding, and ACID transactions, targeting sub-second response time even at billions of nodes. As of May 2026, Hypermode has launched a SaaS that layers agent memory and MCP integration on top of Dgraph.

13. RDF and SPARQL — The Semantic Web Lineage

If property graphs model the world as "nodes and edges with properties", RDF (Resource Description Framework, W3C 1999/2014) models everything as subject-predicate-object triples. It is a W3C standard and pairs with ontology languages like OWL and SHACL.

# SPARQL — fetch a list of South Korean presidents from DBpedia
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbr: <http://dbpedia.org/resource/>

SELECT ?president ?name ?term WHERE {
  ?president dbo:office dbr:President_of_South_Korea .
  ?president rdfs:label ?name .
  ?president dbo:termPeriod ?term .
  FILTER(LANG(?name) = "ko")
}
ORDER BY ?term

RDF dominates academia, government, libraries, and life sciences. Massive public knowledge graphs like DBpedia, Wikidata, UniProt, and DrugBank are all published as RDF. The downside is a steep learning curve — the triple model is less intuitive than property graphs, and SPARQL is more complex than SQL or Cypher.

14. RDF Triple Stores — GraphDB · Stardog · Virtuoso · Jena

ToolVendorLicenseNotes
Ontotext GraphDBOntotextCommercial + FreeStrong reasoning, Linked Data
StardogStardog UnionCommercialKG platform, virtual graphs
OpenLink VirtuosoOpenLinkCommercial + GPLDBpedia / UniProt backend, SQL + SPARQL
Apache Jena FusekiApacheApache 2.0The most popular OSS Jena server
Blazegraph(legacy)GPL2Wikidata backend; development halted
AllegroGraphFranzCommercialEnterprise, neuro-symbolic
RDF4JEclipseEDLSesame successor, Java library

The Wikidata Query Service ran on Blazegraph since 2015, but hit scalability limits in 2024 and is migrating to QLever (Universität Freiburg). That migration is symbolic of the generational shift among RDF triple stores.

15. Apache TinkerPop and Gremlin — Graph's Attempt at SQL

Apache TinkerPop (2009–) is an abstraction layer for graph databases that ships the Gremlin traversal language. It lets you use the same code against most property graph DBs — Neo4j, Neptune, JanusGraph, Cosmos DB, OrientDB, and so on.

# Gremlin — TinkerPop standard
from gremlin_python.process.anonymous_traversal import traversal
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection

g = traversal().withRemote(DriverRemoteConnection('ws://localhost:8182/gremlin', 'g'))

# friend-of-friend (2-hop) recommendation
result = (g.V()
           .has('person', 'name', 'Alice')
           .out('knows')
           .out('knows')
           .dedup()
           .has('age', P.gte(18))
           .limit(10)
           .values('name')
           .toList())

JanusGraph (Apache, the Titan successor), OrientDB, and Cosmos DB Gremlin all adopt Gremlin as a first-class language. The downside is the learning curve — the functional / chaining style is less intuitive than Cypher. With GQL adopted by ISO in 2024, Gremlin is gradually losing influence.

16. SurrealDB · OrientDB · Aerospike Graph

SurrealDB (2022–, Rust, BSL → Apache 2.0) is a multi-model DB (document, graph, KV, time series) that uses a SurrealQL syntax close to SQL. After its 1.0 GA in 2025 it has been gaining traction quickly, and supports embedded, server, and serverless modes.

OrientDB (2010–, sponsored by SAP, Apache 2.0) is a first-generation multi-model graph DB with OrientDB Studio, Gremlin, and an SQL extension. Once neck and neck with ArangoDB, its growth has slowed in recent years.

Aerospike Graph (2023, after Aerospike acquired AeroGraph) layers Gremlin on top of the Aerospike key-value engine, focusing on ad-tech identity graphs and real-time recommendations.

-- SurrealQL — graph traversal in a SQL-shaped syntax
SELECT name, ->knows->person.name AS friends
FROM person
WHERE country = "KR"
LIMIT 10;

-- multi-hop traversal
SELECT ->knows->person->knows->person.name AS fofs
FROM person:alice;

SurrealDB ships live queries (WebSocket change subscriptions), built-in permissions and auth, and SDKs for JS, Python, and Rust. In 2026 it added SurrealKV (embedded) and SurrealCloud (managed).

17. JanusGraph and ArcadeDB — Distributed Graphs in the Apache Camp

JanusGraph (Apache 2.0, successor to Titan, 2017–) is a distributed graph DB that runs on top of Apache Cassandra, HBase, or BigTable. It is deployed at PayPal, Uber, IBM, and NASA, and has been validated at scales beyond ten billion edges.

# JanusGraph + Cassandra stack (Docker)
docker run --name janus -p 8182:8182 janusgraph/janusgraph
// JanusGraph Gremlin console
graph = JanusGraphFactory.open('conf/janusgraph-cassandra.properties')
g = graph.traversal()
g.addV('person').property('name', 'Alice').next()

ArcadeDB (2021, the next-generation OrientDB successor, Apache 2.0) is a newer multi-model DB (graph, document, KV, search) that packs everything into a single engine and supports OrientDB-compatible APIs alongside Cypher, Gremlin, and SQL. It runs in embedded, server, and cluster modes.

18. Microsoft GraphRAG · LlamaIndex · LangChain

The GraphRAG tooling camp exploded between 2024 and 2026.

ToolPositionTrait
Microsoft GraphRAGPython libraryThe original, community-based summarization
Neo4j LLM Knowledge Graph BuilderWeb UI + libraryPDF → graph, built on Neo4j
LlamaIndex Property Graph IndexPythonModular, swappable graph backends
LangChain Knowledge GraphPython / JSLLMChain + Cypher translation
FalkorDB GraphRAG SDKPythonRedis-backed, quick prototypes
Memgraph LangChainPythonIn-memory + GraphRAG
DSPy + GraphPythonPrompt auto-optimization + graphs
CogneePythonLLM-driven graph memory
# LlamaIndex Property Graph Index — Neo4j backend
from llama_index.core import PropertyGraphIndex, SimpleDirectoryReader
from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI

documents = SimpleDirectoryReader("./data").load_data()

graph_store = Neo4jPropertyGraphStore(
    username="neo4j",
    password="password",
    url="bolt://localhost:7687",
)

index = PropertyGraphIndex.from_documents(
    documents,
    property_graph_store=graph_store,
    embed_model=OpenAIEmbedding(),
    llm=OpenAI(model="gpt-4o"),
)

# natural-language question -> auto-translated Cypher -> answer
query_engine = index.as_query_engine(include_text=True)
print(query_engine.query("Summarize the relationships between the main characters."))

19. Vector DB vs Graph DB vs Hybrid

The most important design question in the RAG era is "vector DB, graph DB, or hybrid?"

  • Vector DB only (Pinecone, Weaviate, Qdrant, pgvector): great for semantic search and document Q&A, weak on global questions and multi-hop reasoning.
  • Graph DB only (Neo4j, ArangoDB): strong when you have clear entity / relationship structure (knowledge graphs, fraud); unstructured text needs a separate extraction step.
  • Hybrid (Neo4j + vector index, FalkorDB + embeddings): the most common GraphRAG pattern. Store embeddings on graph nodes, use vector search for entry points, then expand via graph traversal.
# Neo4j 5 — pattern for putting a vector index on a node (hybrid)
# 1. Create the vector index
CALL db.index.vector.createNodeIndex(
  'movie-embeddings', 'Movie', 'embedding', 1536, 'cosine'
)

# 2. Create a movie node with an embedding
CREATE (m:Movie {title: 'Matrix', embedding: [0.123, 0.456, ...]})

# 3. Combined vector + graph search
MATCH (m:Movie)
CALL db.index.vector.queryNodes('movie-embeddings', 10, $query_emb)
YIELD node AS similar, score
MATCH (similar)<-[:ACTED_IN]-(actor:Person)-[:ACTED_IN]->(other:Movie)
RETURN other.title, score, COUNT(actor) AS shared_actors
ORDER BY shared_actors DESC, score DESC LIMIT 10

20. Fraud Detection — PayPal · Banks · Carriers

Fraud detection is the oldest killer app for graph DBs. PayPal adopted Neo4j in the 2010s to fight credit-card fraud, account takeovers, and money laundering. The core pattern is multi-hop exploration like "clusters of accounts that share the same device, IP, or phone number."

// Detect a suspicious cluster of accounts
MATCH (a:Account)-[:USED_DEVICE]->(d:Device)<-[:USED_DEVICE]-(b:Account)
WHERE a <> b
WITH d, COUNT(DISTINCT [a, b]) AS accounts_sharing
WHERE accounts_sharing > 5
RETURN d.id AS suspicious_device, accounts_sharing
ORDER BY accounts_sharing DESC LIMIT 20

In Korea, Shinhan Card, KB Card, and Woori Bank have partially deployed graph-based fraud detection; in Japan, SMBC and Rakuten Card use TigerGraph and Neo4j. In the US, Equifax and Experian operate credit graphs and carriers like Verizon and AT&T leverage graphs for SIM-swap fraud detection.

21. Recommendation Systems — Netflix · eBay · LinkedIn

Recommendation systems are the second killer app. They run GNNs, PageRank, or node embeddings over a user-item-attribute graph to generate suggestions.

Netflix operates internal graphs over movies, series, tags, and users, and uses a mix of Neo4j and proprietary graph systems for parts of it. eBay runs the product catalog on Ontotext GraphDB to power multilingual search and synonym handling. LinkedIn's People You May Know runs on LIquid, an in-house graph system.

// Movie recommendation — shared-genre + shared-actor graph pattern
MATCH (user:User {id: $userId})-[:WATCHED]->(movie:Movie),
      (movie)-[:HAS_GENRE]->(g:Genre)<-[:HAS_GENRE]-(rec:Movie),
      (movie)<-[:ACTED_IN]-(actor:Person)-[:ACTED_IN]->(rec)
WHERE NOT (user)-[:WATCHED]->(rec)
WITH rec, COUNT(DISTINCT g) AS shared_genres,
          COUNT(DISTINCT actor) AS shared_actors
RETURN rec.title, shared_genres + shared_actors AS score
ORDER BY score DESC LIMIT 10

22. Drug Discovery and Life Sciences — Pfizer · Roche · BenevolentAI

Life sciences is a natural fit for graph DBs: protein-gene-disease-drug relationships are intrinsically multi-relational. Pfizer, AstraZeneca, and Roche use Neo4j and RDF triple stores (GraphDB, Stardog) for drug-candidate discovery.

BenevolentAI (UK) runs its own knowledge graph BKG (Benevolent Knowledge Graph) with 7B+ triples to power discovery for ALS and COVID-19; in 2020 it famously surfaced baricitinib as a COVID-19 treatment via this graph. UniProt, DrugBank, ChEMBL, and OpenTargets are well-known public RDF knowledge graphs.

# OpenTargets — disease -> target gene -> drug traversal
PREFIX otar: <http://identifiers.org/opentargets/>

SELECT ?disease ?target ?drug WHERE {
  ?disease otar:efoId "EFO_0000676" .  # Alzheimer
  ?disease otar:associatedTarget ?target .
  ?target otar:knownDrugs ?drug .
  ?drug otar:phase ?phase .
  FILTER(?phase >= 3)
}

23. Identity Resolution · MDM · Cybersecurity

Identity resolution is the practice of unifying identity fragments across channels and devices into a single person. It is foundational infrastructure for marketing 360° customer views, KYC, and AML, and graph DBs are the standard tool.

MDM (Master Data Management) platforms — Reltio, Tamr, Stibo, Profisee — all use graph DBs internally. Modeling product, customer, and supplier relationships is naturally graph-shaped.

In cybersecurity, BloodHound (Active Directory privilege graph) is the standard tool for penetration testing, and Stellar Cyber, Lacework, and Wiz operate cloud-asset and vulnerability graphs.

// BloodHound — shortest path of privileges in Active Directory
MATCH p=shortestPath(
  (u:User {name: 'low_priv_user'})-[*1..6]->(da:Group {name: 'DOMAIN ADMINS'})
)
RETURN p

24. Public Knowledge Graphs — DBpedia · Wikidata · YAGO · ConceptNet

Knowledge GraphScale (2026)FormatNotes
Wikidata1.1B+ triplesRDF, SPARQLWikipedia sister project, largest open KG
DBpedia580M+ triplesRDF, SPARQLExtracted from Wikipedia infoboxes
YAGO 4.5200M+ triplesRDFWikidata + WordNet, U. Saarland
ConceptNet 5.734M+ triplesJSON, RESTCommon-sense reasoning, MIT
ImageNet KG-JSONVisual object classification
OpenStreetMap-XML, PBFGeographic graph

The Wikidata Query Service is migrating off Blazegraph to a new backend on QLever and Wikibase starting in 2024, and DBpedia Live reflects Wikipedia edits in near real time. Microsoft Academic Graph was retired in 2021 and has been succeeded by OpenAlex (2022–) and Semantic Scholar.

25. Graph in Korea — NAVER · Kakao · Samsung

NAVER uses an internal knowledge graph extensively across search and services. It consolidates NAVER Knowledge Encyclopedia, person and place data into a graph and powers the search entity Knowledge Panel on top of it. Kakao Enterprise offers KG-Cloud, a KG SaaS for enterprises. Samsung Research builds the Bixby knowledge graph backend and Galaxy AI context memory as graphs.

POSTECH PRESM (Personal Relevant Search Modeling), KAIST DKE Lab, and Seoul National University BiKE Lab lead academic Korean KG research, and the NIA (National Information Society Agency) public KG initiatives drive adoption in government agencies. Law firms Yulchon and Kim & Chang are evaluating legal-domain knowledge graphs.

26. Graph in Japan — PFN · NTT · Yahoo!Japan

PFN (Preferred Networks) operates the PFN Bio Knowledge Graph for drug-discovery research, integrating chemical structures, proteins, and papers. NTT Knowledge Computing has built its own graph DB and reasoning engine, applied to telecom and healthcare verticals.

Yahoo!Japan (LINE Yahoo) uses an internal knowledge graph to power search and news recommendations, and Rakuten combines a product-catalog graph with a user-behavior graph for recommendations. CyberAgent, DeNA, and Mercari ship graph-driven recommendations in advertising, gaming, and resale marketplaces.

In academia, the National Institute of Informatics (NII), University of Tokyo Hori Lab, and Kyoto University Kyoto KG lead Japanese KG research, and the Japanese Wikidata Activation Project kicked off in 2024 with government support.

27. Visualization — Bloom · Cytoscape · Gephi · yFiles

For graphs, visualization is half the value. Major tools fall into these categories.

ToolLicenseNotes
Neo4j BloomCommercial (bundled with Neo4j)For business users, natural-language queries
NeoDashApache 2.0Neo4j dashboards
GraphileonCommercialGraph application builder
LinkuriousCommercialFraud / intelligence
yWorks yFilesCommercialJS / HTML / Java / .NET library
CytoscapeOSS, LGPLLife-sciences standard
Cytoscape.jsMITWeb graph viz
GephiGPLv3Academic / social network analysis
GraphvizEPLStatic diagrams
Sigma.jsMITLarge-scale web viz
Apache ECharts GraphApache 2.0General chart + graph

In the GraphRAG era, visualization tools are starting to integrate with LLMs so you can give natural-language instructions like "show me a 3-hop neighborhood around Alice." Linkurious's 2024 release is the canonical example.

28. Decision Tree — Which Graph DB Should You Pick

As of May 2026, here is a working recommendation tree.

  • Maturity and ecosystem first -> Neo4j 5 (the safest start)
  • Already on AWS -> Amazon Neptune + Neptune Analytics
  • Already on Postgres -> Apache AGE
  • Need in-memory speed -> Memgraph or FalkorDB
  • Need tens of billions of edges distributed -> TigerGraph or JanusGraph
  • Want documents and KV alongside graphs -> ArangoDB or SurrealDB
  • Local / laptop analytics -> Kuzu (the DuckDB of graphs)
  • GraphQL is your standard interface -> Dgraph / Hypermode
  • RDF / semantic-web standards -> GraphDB or Stardog
  • GraphRAG prototype -> Neo4j + LlamaIndex, or the FalkorDB SDK
  • Public-data graphs -> Wikidata + SPARQL

Three axes always anchor the choice: "query-language preference (Cypher / GQL / Gremlin / SPARQL) + data model (LPG vs RDF) + deployment posture (self-hosted vs managed)."

29. Closing — Graphs Are the Memory of the RAG and Agent Era

It was no accident that the Microsoft GraphRAG paper and the ISO GQL standard both landed in April 2024. As LLMs began to understand the meaning of text, the structure of entities and relationships became important again, and ISO responded by adopting the first new database query standard since SQL.

The 2026 graph DB is no longer a niche tool — it is a standard component of the RAG stack. For a new project, the safest starting point is "Neo4j 5 + Vector Index + LlamaIndex Property Graph"; in an AWS environment Neptune Analytics, and in a Postgres environment Apache AGE + pgvector, are natural picks. If in-memory speed is the absolute priority, Memgraph or FalkorDB; for local analytics, Kuzu.

A decade ago graph DBs were "a niche you reach for when relationships dominate the problem." In 2026 they are "the standard infrastructure for RAG, agent memory, MDM, fraud detection, and drug discovery." The next decade of data infrastructure will be built on three legs: SQL (transactions) + Vector (semantic search) + Graph (relationship reasoning).

References