Skip to content
Published on

GraphRAG Complete Guide: How Knowledge Graphs Overcome RAG Limitations

Authors

Where Standard RAG Fails

After running vector-search RAG in production, you'll notice it fails consistently on certain question types.

Questions that break standard RAG:

  • "Summarize the overall risk factors across this quarter's reports"
  • "What complaint patterns appear repeatedly in our product reviews?"
  • "How has the company's strategy changed since the CEO transition?"
  • "What are the common failure characteristics between Product A and Product B?"

Why does it fail? These questions share a common property: no single chunk can answer them — you need a cross-document understanding of the entire knowledge base.

Standard RAG's mechanics:

  1. Embed the question
  2. Find the 3-5 most similar chunks by cosine similarity
  3. Generate an answer from those chunks

When you ask "what are the overall risk factors across these reports?" RAG retrieves a section from one report — it can't see patterns that span all the documents. That's the fundamental limitation.


What Is GraphRAG? (Microsoft Research, 2024)

Microsoft Research published "From Local to Global: A GraphRAG Approach to Query-Focused Summarization" (Edge et al., 2024) introducing this methodology.

The core idea is straightforward: instead of just splitting documents into chunks, extract entities and relationships to build a knowledge graph, then retrieve from that graph.

Standard RAG:
Documents -> Chunk -> Embed -> Vector DB -> Similarity Search -> Relevant Chunks

GraphRAG:
Documents -> Entity Extraction -> Knowledge Graph -> Community Detection ->
          -> Hierarchical Summaries -> Multi-level Retrieval (Local + Global)

Understanding the Core Structure

1. Entity and Relationship Extraction

GraphRAG uses an LLM to extract entities (people, organizations, places, concepts) and their relationships from documents.

Input text:
"Samsung launched the Galaxy S25 in 2024, competing directly with Apple's
iPhone 16. The Galaxy S25 is powered by the Qualcomm Snapdragon 8 Elite chip."

Extracted entities:
- Samsung (organization)
- Galaxy S25 (product)
- Apple (organization)
- iPhone 16 (product)
- Qualcomm (organization)
- Snapdragon 8 Elite (product/technology)

Extracted relationships:
- Samsung --[launched]--> Galaxy S25
- Galaxy S25 --[competes with]--> iPhone 16
- Galaxy S25 --[powered by]--> Snapdragon 8 Elite
- Qualcomm --[manufactures]--> Snapdragon 8 Elite

2. Community Detection

Using algorithms like Leiden on the entity graph, GraphRAG finds clusters of closely connected entities (communities).

Examples: smartphone market community, semiconductor community, software ecosystem community.

3. Hierarchical Summaries

Summaries are generated for each community at multiple granularity levels:

Level 0 (most detailed): individual entity/relationship summaries
Level 1: small community summaries
Level 2: mid-scale community summaries (most commonly used)
Level 3 (most comprehensive): global topic summaries

GraphRAG in Code

Using Microsoft's official graphrag library:

# Install
pip install graphrag

# Initialize project
mkdir my-graphrag-project
graphrag init --root ./my-graphrag-project

The generated settings.yaml key configuration:

# Key settings in settings.yaml
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat
  model: gpt-4o-mini  # for indexing (mini to reduce cost)
  model_supports_json: true

embeddings:
  llm:
    model: text-embedding-3-small

input:
  type: file
  file_type: text
  base_dir: "input"  # put your documents here

chunks:
  size: 1200
  overlap: 100
# Indexing (building the knowledge graph)
# graphrag index --root ./my-graphrag-project
# What this does internally:
# 1. Chunk documents
# 2. Extract entities and relationships (LLM calls)
# 3. Build the graph
# 4. Run community detection
# 5. Generate summaries for each community (LLM calls)

# Querying via Python API
import asyncio
from graphrag.query.api import local_search, global_search

# Local Search: entity/relationship-focused questions
async def search_local(query: str):
    result = await local_search(
        config_dir="./my-graphrag-project",
        data_dir="./my-graphrag-project/output",
        root_dir="./my-graphrag-project",
        community_level=2,
        response_type="multiple paragraphs",
        query=query,
    )
    return result.response

# Global Search: questions requiring synthesis across the full knowledge base
async def search_global(query: str):
    result = await global_search(
        config_dir="./my-graphrag-project",
        data_dir="./my-graphrag-project/output",
        root_dir="./my-graphrag-project",
        community_level=2,
        response_type="multiple paragraphs",
        query=query,
    )
    return result.response

# Usage
local_result = asyncio.run(search_local(
    "What is the relationship between Samsung and TSMC?"
))

global_result = asyncio.run(search_global(
    "What are the main trends in the semiconductor industry across these documents?"
))

Local vs Global Search: When to Use Which

Understanding the two query modes is the key to using GraphRAG effectively.

Query TypeBest ModeExample
Question about a specific entityLocal Search"What is John Kim's role?"
Relationship between two entitiesLocal Search"What's the Samsung-TSMC relationship?"
Overall trend identificationGlobal Search"What are the main themes in these docs?"
Cross-document pattern analysisGlobal Search"What are the industry-wide risk factors?"
Changes over timeBoth"How did strategy evolve over 5 years?"

Local Search internally:

  1. Find relevant entities from the query
  2. Collect text chunks, relationships, and community summaries connected to those entities
  3. Generate final answer with LLM

Global Search internally:

  1. Split all community summaries into "batches"
  2. Generate partial answers for each batch (Map phase)
  3. Merge partial answers into final answer (Reduce phase)

Exploring the Knowledge Graph Directly

Visualizing GraphRAG's generated graph reveals insights about your data.

import pandas as pd
import networkx as nx

# Load entities and relationships from GraphRAG output
entities_df = pd.read_parquet("./output/entities.parquet")
relationships_df = pd.read_parquet("./output/relationships.parquet")

print(f"Total entities: {len(entities_df)}")
print(f"Total relationships: {len(relationships_df)}")

# Build NetworkX graph
G = nx.DiGraph()

for _, entity in entities_df.iterrows():
    G.add_node(entity["title"], type=entity["type"])

for _, rel in relationships_df.iterrows():
    G.add_edge(
        rel["source"],
        rel["target"],
        weight=rel["weight"],
        description=rel["description"]
    )

# Find hub entities (most connected)
top_hubs = sorted(G.degree(), key=lambda x: x[1], reverse=True)[:10]
print("Top 10 most connected entities:")
for entity, degree in top_hubs:
    print(f"  {entity}: {degree} connections")

GraphRAG vs Standard RAG: Performance Comparison

From Microsoft's original paper (HotPotQA, MuSiQue datasets):

Global queries (requiring whole-document synthesis):
- Standard RAG:  comprehensiveness 40%, diversity 57%
- GraphRAG:      comprehensiveness 72%, diversity 62%
  -> 80% improvement in comprehensiveness!

Local queries (specific fact retrieval):
- Standard RAG:  accuracy ~65%
- GraphRAG:      accuracy ~70%
  -> Modest improvement

Latency:
- Standard RAG:  0.5-2 seconds
- GraphRAG:      3-10 seconds (due to community summary aggregation)

Conclusion: GraphRAG dominates on global queries, but is comparable to or slower than standard RAG for specific fact retrieval.


Cost Reality: Honest Numbers

GraphRAG's biggest drawback is indexing cost.

# GraphRAG indexing cost estimate
# Assumptions: 1,000 documents, 1,000 tokens each

num_documents = 1000
tokens_per_doc = 1000
total_tokens = num_documents * tokens_per_doc  # 1M tokens

# Estimated LLM calls during indexing
entity_extraction_calls = total_tokens / 1200  # 1 call per chunk
# Each call: ~800 input tokens + ~400 output tokens

community_summary_calls = 200  # number of communities x levels
# Each call: ~2000 input tokens + ~500 output tokens

# Cost using GPT-4o-mini ($0.15/1M input, $0.60/1M output)
entity_input_cost = (entity_extraction_calls * 800 / 1_000_000) * 0.15
entity_output_cost = (entity_extraction_calls * 400 / 1_000_000) * 0.60
community_cost = (community_summary_calls * 2500 / 1_000_000) * 0.40

total_indexing_cost = entity_input_cost + entity_output_cost + community_cost
print(f"Estimated indexing cost: ${total_indexing_cost:.2f}")
# 1,000 documents: approx $1-5 (using GPT-4o-mini)
# 10,000 documents: approx $10-50
# 100,000 documents: approx $100-500

# Query cost (Global Search)
# Global search processes all community summaries for each query
# 200 communities x 500 tokens each = ~100K tokens processed per query
global_query_cost = (100_000 / 1_000_000) * 2.50  # GPT-4o pricing
print(f"Global search cost per query: ${global_query_cost:.3f}")  # $0.25/query

Is this acceptable? Depends on your situation:

  • Initial indexing: one-time cost if your documents don't change often
  • Query cost: Global search can be 10-100x more expensive per query than standard RAG

GraphRAG vs Standard RAG: When to Use Which

GraphRAG is worth it when:

  • Analyzing patterns/trends across hundreds to thousands of documents
  • Financial reports, patents, legal document analysis
  • Knowledge base is relatively static (indexing cost amortizes)
  • Broad exploratory questions are common ("what do we know about this company?")

Standard RAG is sufficient when:

  • Specific fact retrieval ("what's the expiration date on this contract?")
  • Real-time services requiring fast response
  • Frequently updated documents (re-indexing cost)
  • Small knowledge base (under ~100 documents)
  • Tight cost budget

LightRAG: A More Practical Alternative

If Microsoft's GraphRAG feels too complex or expensive, consider LightRAG.

# pip install lightrag-hku
from lightrag import LightRAG, QueryParam
from lightrag.llm import gpt_4o_mini_complete

rag = LightRAG(
    working_dir="./lightrag-storage",
    llm_model_func=gpt_4o_mini_complete,
)

# Insert documents
with open("document.txt", "r") as f:
    rag.insert(f.read())

# Query (4 modes: naive, local, global, hybrid)
result = rag.query(
    "What are the main trends?",
    param=QueryParam(mode="global")  # hybrid mode is also effective
)

LightRAG is simpler to implement, cheaper to run, and delivers sufficient performance for small to medium knowledge bases.


Conclusion: GraphRAG Is a Tool, Not a Silver Bullet

GraphRAG is powerful, but it's not the right choice for every situation.

Summary:

  • Significantly outperforms standard RAG on global queries
  • High indexing cost means it's best when the knowledge base is stable
  • Query cost is also higher than standard RAG
  • To get started quickly: use Microsoft's graphrag library or LightRAG

Think about ROI. Adopt GraphRAG when you have a clear requirement like "we need to understand patterns across this corpus." For specific fact retrieval, a well-tuned standard RAG pipeline is more cost-effective.