- Authors

- Name
- Youngju Kim
- @fjvbn20031
- Why Memory Is the Hardest Problem in AI Agents
- Four Memory Types Borrowed From Human Psychology
- Production Memory Architecture
- Mem0: The Modern Memory Framework
- The Hard Problems in Memory Systems
- Which Memory Strategy Should You Choose?
- Wrapping Up
Why Memory Is the Hardest Problem in AI Agents
"Hi, my name is Alex." "Hello, Alex!" [conversation ends]
[new session starts] "Do you know my name?" "I'm sorry, I don't have any record of previous conversations..."
LLMs have no memory by default. Every conversation is the first conversation. This is one of the most fundamental limitations of current AI assistants.
The things we take for granted in human relationships — "what you told me last time," "what you like," "the problem we solved together" — AI can't do these by default.
That's why agent memory systems matter. This post covers everything from theory to production implementation.
Four Memory Types Borrowed From Human Psychology
The way psychologists classify human memory turns out to be surprisingly useful for designing AI memory systems.
1. Sensory Memory → Context Window
Sensory memory holds incoming information for a very brief time (0.5 to 3 seconds). In LLMs, this corresponds to the current input — the contents of the context window.
# This IS sensory memory:
messages = [
{"role": "user", "content": "My name is Alex"},
{"role": "assistant", "content": "Hello, Alex!"},
{"role": "user", "content": "What's my name?"},
# LLM can see the conversation above, so it can answer
]
Characteristics: fast access, but disappears when the conversation ends. Limited by context window size.
2. Short-term / Working Memory → Conversation Buffer
Short-term memory temporarily holds information while working. The conversation buffer serves this role.
from langchain.memory import ConversationBufferWindowMemory
# Keep only the most recent 10 conversation exchanges
memory = ConversationBufferWindowMemory(k=10, return_messages=True)
# Add conversation
memory.save_context(
{"input": "What's list comprehension in Python?"},
{"output": "List comprehension is a concise way to create a new list based on an existing one."}
)
# Load memory (inject into LLM)
history = memory.load_memory_variables({})
Characteristics: maintains session context. But once it exceeds k entries, older ones are dropped. Cannot persist long-term.
3. Long-term Memory → Vector Store + External DB
Long-term memory stores information over time and retrieves it when needed. Vector stores play this role.
from langchain.memory import VectorStoreRetrieverMemory
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
# Create long-term memory store
embedding = OpenAIEmbeddings()
vectorstore = FAISS.from_texts(["placeholder"], embedding)
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
memory = VectorStoreRetrieverMemory(retriever=retriever)
# Save to memory
memory.save_context(
{"input": "My favorite programming language is Python"},
{"output": "Got it! You love Python."}
)
memory.save_context(
{"input": "I live in Seoul and work at a startup"},
{"output": "A startup in Seoul — noted!"}
)
# Later: retrieve relevant memories
relevant = memory.load_memory_variables({"prompt": "Recommend some tech for me"})
# Returns: memories about Python and Seoul startup
Key advantage: can pull only relevant memories from millions of stored entries. Retrieval based on semantic similarity.
4. Episodic Memory → Structured Event Log
Episodic memory records specific events and experiences. Time and context are included.
import json
from datetime import datetime
class EpisodicMemory:
def __init__(self, db_connection):
self.db = db_connection
def record_episode(self, user_id: str, episode: dict):
"""Record a specific event or interaction"""
self.db.insert("episodes", {
"user_id": user_id,
"timestamp": datetime.now().isoformat(),
"event_type": episode["type"], # "purchase", "complaint", "success"
"summary": episode["summary"],
"metadata": json.dumps(episode.get("metadata", {}))
})
def retrieve_episodes(
self,
user_id: str,
event_type: str = None,
limit: int = 10
):
"""Retrieve a user's past episodes"""
if event_type:
query = """
SELECT * FROM episodes
WHERE user_id = ? AND event_type = ?
ORDER BY timestamp DESC LIMIT ?
"""
return self.db.query(query, [user_id, event_type, limit])
else:
query = """
SELECT * FROM episodes
WHERE user_id = ?
ORDER BY timestamp DESC LIMIT ?
"""
return self.db.query(query, [user_id, limit])
# Example usage
memory = EpisodicMemory(db)
memory.record_episode("user_123", {
"type": "purchase",
"summary": "User purchased advanced Python course",
"metadata": {"course_id": "py-advanced", "price": 89}
})
Production Memory Architecture
Enough theory — let's see how to combine these in practice.
User Message
|
v
+--------------------------------------------------+
| Memory Router |
| (gather relevant memories, prioritize) |
+------------+----------+-----------+--------------+
| | |
v v v v
Sensory Short-term Long-term Episodic
(context) (buffer) (vector) (event log)
| | | |
+----------+-----------+-----------+
|
v
Consolidated Context
(select most relevant memories)
|
v
LLM Generation
|
v
Memory Update
(store new info to right memory type)
In short:
- When a user message arrives, the Memory Router pulls relevant content from each memory type
- The most relevant memories are selected and assembled as context
- The LLM generates a response using that context
- New information gets stored in the appropriate memory type
In actual code:
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.memory import (
ConversationBufferWindowMemory,
VectorStoreRetrieverMemory,
CombinedMemory
)
from langchain.prompts import PromptTemplate
from langchain.chains import ConversationChain
# Configure memory
short_term = ConversationBufferWindowMemory(
k=5,
memory_key="chat_history",
return_messages=True
)
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_texts(["placeholder"], embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
long_term = VectorStoreRetrieverMemory(
retriever=retriever,
memory_key="relevant_history"
)
# Combine both memories
combined_memory = CombinedMemory(memories=[short_term, long_term])
# Set up prompt
template = """
You are a personal AI assistant who remembers the user well.
Past relevant memories:
{relevant_history}
Recent conversation:
{chat_history}
User: {input}
Assistant:"""
prompt = PromptTemplate(
input_variables=["relevant_history", "chat_history", "input"],
template=template
)
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = ConversationChain(
llm=llm,
memory=combined_memory,
prompt=prompt,
verbose=True
)
# Conversation
chain.predict(input="Hi! I'm a backend developer.")
chain.predict(input="I've been learning Rust lately.")
# Even in a new session, long-term memory is available
chain.predict(input="How do I build a web server with the language I'm learning?")
# Can respond with "In Rust..."
Mem0: The Modern Memory Framework
There's a more convenient option than building this yourself. Mem0 is a modern open-source framework that provides a memory layer for AI applications.
from mem0 import Memory
m = Memory()
# Add memories (LLM automatically extracts and stores important info)
result = m.add(
messages=[
{"role": "user", "content": "I'm vegetarian, Korean, and live in Seoul. I like Python."},
{"role": "assistant", "content": "Got it! I'll remember that."}
],
user_id="user_123"
)
# Check what was stored
print(result)
# [
# {"memory": "User is vegetarian", "id": "..."},
# {"memory": "User is Korean and lives in Seoul", "id": "..."},
# {"memory": "User prefers Python", "id": "..."}
# ]
# Search memories (based on semantic similarity)
memories = m.search("recommend me some food", user_id="user_123")
# Returns: vegetarian, Seoul-related memories
# Use in agent
context = "\n".join([item["memory"] for item in memories["results"]])
response = llm.invoke(
f"User context:\n{context}\n\nRequest: recommend me some food"
)
Mem0 advantages:
- LLM automatically extracts and stores important info from conversations
- Automatic handling of duplicate and conflicting memories
- Memory separated by user, agent, and session
- Both REST API and Python SDK available
The Hard Problems in Memory Systems
Now that we know how to implement, let's look at the genuinely difficult parts.
Problem 1: Memory Conflicts
# 3 years ago: "I live in Seoul"
# Today: "I moved to Busan last year"
# How do you handle this?
# Option A: overwrite with latest information
# Option B: keep both with timestamps
# Option C: ask user to confirm
Mem0 uses an LLM to detect conflicts and automatically update. But it's not 100% perfect.
Problem 2: Privacy
How much should be remembered? This is both a technical and ethical question.
- Users must be able to request memory deletion (GDPR)
- Sensitive info (medical, financial) needs special handling
- Data from different users must never mix
# Memory deletion for GDPR compliance
def delete_user_memory(user_id: str):
m.delete_all(user_id=user_id)
print(f"All memories for user {user_id} deleted")
Problem 3: Appropriate Forgetting — Ebbinghaus's Forgetting Curve
Psychologist Ebbinghaus studied the pattern of human forgetting over time. This can be applied to AI memory as well.
import math
from datetime import datetime
def calculate_memory_importance(memory: dict) -> float:
"""
Calculate the current importance of a memory.
Importance decreases over time, but memories
accessed frequently maintain their importance.
"""
days_since_last_access = (
datetime.now() - memory["last_accessed"]
).days
access_count = memory["access_count"]
# Based on Ebbinghaus forgetting curve
# Base forgetting: decreases over time
base_retention = math.exp(-days_since_last_access / 30)
# Reinforcement from access frequency
# (frequently referenced = harder to forget)
reinforcement = math.log(1 + access_count) * 0.3
return min(1.0, base_retention + reinforcement)
# Periodically clean up low-importance memories
def prune_memories(user_id: str, threshold: float = 0.1):
memories = m.get_all(user_id=user_id)
for memory in memories:
if calculate_memory_importance(memory) < threshold:
m.delete(memory["id"])
Problem 4: Scalability
How do you manage memory for millions of users?
- Per-user independent vector index vs shared index
- Memory compression (increase info density through summarization)
- Hot/cold storage separation (frequently accessed vs. old memories)
- Sharding and distributed processing
Best practices here aren't fully established yet. In production, it's common to combine with infrastructure like Redis, Pinecone, or Weaviate.
Which Memory Strategy Should You Choose?
A simple decision guide.
| Use Case | Recommended Strategy |
|---|---|
| Simple chatbot | ConversationBufferWindowMemory (recent k) |
| Personalized assistant | Short-term + long-term (vector) combined |
| Customer service | Episodic + long-term memory |
| Coding assistant | Project codebase context + short-term |
| Production AI app | Mem0 or custom + privacy layer |
Wrapping Up
Memory is the key element that makes AI agents genuinely useful. Eliminating the frustration of "I have to explain this from scratch again?" is the next step for AI assistants.
If you're building an AI project right now, incorporate the memory system into your design from the start. Adding it later is much harder.
Start with a framework like Mem0. It's a great way to learn what problems have already been solved before building your own.
An AI with memory isn't just a tool — it can be a real collaborator. The difference is larger than you might expect.