- Authors

- Name
- Youngju Kim
- @fjvbn20031
- Where Standard RAG Fails
- What Is GraphRAG? (Microsoft Research, 2024)
- Understanding the Core Structure
- GraphRAG in Code
- Local vs Global Search: When to Use Which
- Exploring the Knowledge Graph Directly
- GraphRAG vs Standard RAG: Performance Comparison
- Cost Reality: Honest Numbers
- GraphRAG vs Standard RAG: When to Use Which
- LightRAG: A More Practical Alternative
- Conclusion: GraphRAG Is a Tool, Not a Silver Bullet
Where Standard RAG Fails
After running vector-search RAG in production, you'll notice it fails consistently on certain question types.
Questions that break standard RAG:
- "Summarize the overall risk factors across this quarter's reports"
- "What complaint patterns appear repeatedly in our product reviews?"
- "How has the company's strategy changed since the CEO transition?"
- "What are the common failure characteristics between Product A and Product B?"
Why does it fail? These questions share a common property: no single chunk can answer them — you need a cross-document understanding of the entire knowledge base.
Standard RAG's mechanics:
- Embed the question
- Find the 3-5 most similar chunks by cosine similarity
- Generate an answer from those chunks
When you ask "what are the overall risk factors across these reports?" RAG retrieves a section from one report — it can't see patterns that span all the documents. That's the fundamental limitation.
What Is GraphRAG? (Microsoft Research, 2024)
Microsoft Research published "From Local to Global: A GraphRAG Approach to Query-Focused Summarization" (Edge et al., 2024) introducing this methodology.
The core idea is straightforward: instead of just splitting documents into chunks, extract entities and relationships to build a knowledge graph, then retrieve from that graph.
Standard RAG:
Documents -> Chunk -> Embed -> Vector DB -> Similarity Search -> Relevant Chunks
GraphRAG:
Documents -> Entity Extraction -> Knowledge Graph -> Community Detection ->
-> Hierarchical Summaries -> Multi-level Retrieval (Local + Global)
Understanding the Core Structure
1. Entity and Relationship Extraction
GraphRAG uses an LLM to extract entities (people, organizations, places, concepts) and their relationships from documents.
Input text:
"Samsung launched the Galaxy S25 in 2024, competing directly with Apple's
iPhone 16. The Galaxy S25 is powered by the Qualcomm Snapdragon 8 Elite chip."
Extracted entities:
- Samsung (organization)
- Galaxy S25 (product)
- Apple (organization)
- iPhone 16 (product)
- Qualcomm (organization)
- Snapdragon 8 Elite (product/technology)
Extracted relationships:
- Samsung --[launched]--> Galaxy S25
- Galaxy S25 --[competes with]--> iPhone 16
- Galaxy S25 --[powered by]--> Snapdragon 8 Elite
- Qualcomm --[manufactures]--> Snapdragon 8 Elite
2. Community Detection
Using algorithms like Leiden on the entity graph, GraphRAG finds clusters of closely connected entities (communities).
Examples: smartphone market community, semiconductor community, software ecosystem community.
3. Hierarchical Summaries
Summaries are generated for each community at multiple granularity levels:
Level 0 (most detailed): individual entity/relationship summaries
Level 1: small community summaries
Level 2: mid-scale community summaries (most commonly used)
Level 3 (most comprehensive): global topic summaries
GraphRAG in Code
Using Microsoft's official graphrag library:
# Install
pip install graphrag
# Initialize project
mkdir my-graphrag-project
graphrag init --root ./my-graphrag-project
The generated settings.yaml key configuration:
# Key settings in settings.yaml
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_chat
model: gpt-4o-mini # for indexing (mini to reduce cost)
model_supports_json: true
embeddings:
llm:
model: text-embedding-3-small
input:
type: file
file_type: text
base_dir: "input" # put your documents here
chunks:
size: 1200
overlap: 100
# Indexing (building the knowledge graph)
# graphrag index --root ./my-graphrag-project
# What this does internally:
# 1. Chunk documents
# 2. Extract entities and relationships (LLM calls)
# 3. Build the graph
# 4. Run community detection
# 5. Generate summaries for each community (LLM calls)
# Querying via Python API
import asyncio
from graphrag.query.api import local_search, global_search
# Local Search: entity/relationship-focused questions
async def search_local(query: str):
result = await local_search(
config_dir="./my-graphrag-project",
data_dir="./my-graphrag-project/output",
root_dir="./my-graphrag-project",
community_level=2,
response_type="multiple paragraphs",
query=query,
)
return result.response
# Global Search: questions requiring synthesis across the full knowledge base
async def search_global(query: str):
result = await global_search(
config_dir="./my-graphrag-project",
data_dir="./my-graphrag-project/output",
root_dir="./my-graphrag-project",
community_level=2,
response_type="multiple paragraphs",
query=query,
)
return result.response
# Usage
local_result = asyncio.run(search_local(
"What is the relationship between Samsung and TSMC?"
))
global_result = asyncio.run(search_global(
"What are the main trends in the semiconductor industry across these documents?"
))
Local vs Global Search: When to Use Which
Understanding the two query modes is the key to using GraphRAG effectively.
| Query Type | Best Mode | Example |
|---|---|---|
| Question about a specific entity | Local Search | "What is John Kim's role?" |
| Relationship between two entities | Local Search | "What's the Samsung-TSMC relationship?" |
| Overall trend identification | Global Search | "What are the main themes in these docs?" |
| Cross-document pattern analysis | Global Search | "What are the industry-wide risk factors?" |
| Changes over time | Both | "How did strategy evolve over 5 years?" |
Local Search internally:
- Find relevant entities from the query
- Collect text chunks, relationships, and community summaries connected to those entities
- Generate final answer with LLM
Global Search internally:
- Split all community summaries into "batches"
- Generate partial answers for each batch (Map phase)
- Merge partial answers into final answer (Reduce phase)
Exploring the Knowledge Graph Directly
Visualizing GraphRAG's generated graph reveals insights about your data.
import pandas as pd
import networkx as nx
# Load entities and relationships from GraphRAG output
entities_df = pd.read_parquet("./output/entities.parquet")
relationships_df = pd.read_parquet("./output/relationships.parquet")
print(f"Total entities: {len(entities_df)}")
print(f"Total relationships: {len(relationships_df)}")
# Build NetworkX graph
G = nx.DiGraph()
for _, entity in entities_df.iterrows():
G.add_node(entity["title"], type=entity["type"])
for _, rel in relationships_df.iterrows():
G.add_edge(
rel["source"],
rel["target"],
weight=rel["weight"],
description=rel["description"]
)
# Find hub entities (most connected)
top_hubs = sorted(G.degree(), key=lambda x: x[1], reverse=True)[:10]
print("Top 10 most connected entities:")
for entity, degree in top_hubs:
print(f" {entity}: {degree} connections")
GraphRAG vs Standard RAG: Performance Comparison
From Microsoft's original paper (HotPotQA, MuSiQue datasets):
Global queries (requiring whole-document synthesis):
- Standard RAG: comprehensiveness 40%, diversity 57%
- GraphRAG: comprehensiveness 72%, diversity 62%
-> 80% improvement in comprehensiveness!
Local queries (specific fact retrieval):
- Standard RAG: accuracy ~65%
- GraphRAG: accuracy ~70%
-> Modest improvement
Latency:
- Standard RAG: 0.5-2 seconds
- GraphRAG: 3-10 seconds (due to community summary aggregation)
Conclusion: GraphRAG dominates on global queries, but is comparable to or slower than standard RAG for specific fact retrieval.
Cost Reality: Honest Numbers
GraphRAG's biggest drawback is indexing cost.
# GraphRAG indexing cost estimate
# Assumptions: 1,000 documents, 1,000 tokens each
num_documents = 1000
tokens_per_doc = 1000
total_tokens = num_documents * tokens_per_doc # 1M tokens
# Estimated LLM calls during indexing
entity_extraction_calls = total_tokens / 1200 # 1 call per chunk
# Each call: ~800 input tokens + ~400 output tokens
community_summary_calls = 200 # number of communities x levels
# Each call: ~2000 input tokens + ~500 output tokens
# Cost using GPT-4o-mini ($0.15/1M input, $0.60/1M output)
entity_input_cost = (entity_extraction_calls * 800 / 1_000_000) * 0.15
entity_output_cost = (entity_extraction_calls * 400 / 1_000_000) * 0.60
community_cost = (community_summary_calls * 2500 / 1_000_000) * 0.40
total_indexing_cost = entity_input_cost + entity_output_cost + community_cost
print(f"Estimated indexing cost: ${total_indexing_cost:.2f}")
# 1,000 documents: approx $1-5 (using GPT-4o-mini)
# 10,000 documents: approx $10-50
# 100,000 documents: approx $100-500
# Query cost (Global Search)
# Global search processes all community summaries for each query
# 200 communities x 500 tokens each = ~100K tokens processed per query
global_query_cost = (100_000 / 1_000_000) * 2.50 # GPT-4o pricing
print(f"Global search cost per query: ${global_query_cost:.3f}") # $0.25/query
Is this acceptable? Depends on your situation:
- Initial indexing: one-time cost if your documents don't change often
- Query cost: Global search can be 10-100x more expensive per query than standard RAG
GraphRAG vs Standard RAG: When to Use Which
GraphRAG is worth it when:
- Analyzing patterns/trends across hundreds to thousands of documents
- Financial reports, patents, legal document analysis
- Knowledge base is relatively static (indexing cost amortizes)
- Broad exploratory questions are common ("what do we know about this company?")
Standard RAG is sufficient when:
- Specific fact retrieval ("what's the expiration date on this contract?")
- Real-time services requiring fast response
- Frequently updated documents (re-indexing cost)
- Small knowledge base (under ~100 documents)
- Tight cost budget
LightRAG: A More Practical Alternative
If Microsoft's GraphRAG feels too complex or expensive, consider LightRAG.
# pip install lightrag-hku
from lightrag import LightRAG, QueryParam
from lightrag.llm import gpt_4o_mini_complete
rag = LightRAG(
working_dir="./lightrag-storage",
llm_model_func=gpt_4o_mini_complete,
)
# Insert documents
with open("document.txt", "r") as f:
rag.insert(f.read())
# Query (4 modes: naive, local, global, hybrid)
result = rag.query(
"What are the main trends?",
param=QueryParam(mode="global") # hybrid mode is also effective
)
LightRAG is simpler to implement, cheaper to run, and delivers sufficient performance for small to medium knowledge bases.
Conclusion: GraphRAG Is a Tool, Not a Silver Bullet
GraphRAG is powerful, but it's not the right choice for every situation.
Summary:
- Significantly outperforms standard RAG on global queries
- High indexing cost means it's best when the knowledge base is stable
- Query cost is also higher than standard RAG
- To get started quickly: use Microsoft's
graphraglibrary orLightRAG
Think about ROI. Adopt GraphRAG when you have a clear requirement like "we need to understand patterns across this corpus." For specific fact retrieval, a well-tuned standard RAG pipeline is more cost-effective.