Skip to content
Published on

Agent Memory System Design: Everything About How AI Agents Remember

Authors

Why Memory Is the Hardest Problem in AI Agents

"Hi, my name is Alex." "Hello, Alex!" [conversation ends]

[new session starts] "Do you know my name?" "I'm sorry, I don't have any record of previous conversations..."

LLMs have no memory by default. Every conversation is the first conversation. This is one of the most fundamental limitations of current AI assistants.

The things we take for granted in human relationships — "what you told me last time," "what you like," "the problem we solved together" — AI can't do these by default.

That's why agent memory systems matter. This post covers everything from theory to production implementation.


Four Memory Types Borrowed From Human Psychology

The way psychologists classify human memory turns out to be surprisingly useful for designing AI memory systems.

1. Sensory Memory → Context Window

Sensory memory holds incoming information for a very brief time (0.5 to 3 seconds). In LLMs, this corresponds to the current input — the contents of the context window.

# This IS sensory memory:
messages = [
    {"role": "user", "content": "My name is Alex"},
    {"role": "assistant", "content": "Hello, Alex!"},
    {"role": "user", "content": "What's my name?"},
    # LLM can see the conversation above, so it can answer
]

Characteristics: fast access, but disappears when the conversation ends. Limited by context window size.

2. Short-term / Working Memory → Conversation Buffer

Short-term memory temporarily holds information while working. The conversation buffer serves this role.

from langchain.memory import ConversationBufferWindowMemory

# Keep only the most recent 10 conversation exchanges
memory = ConversationBufferWindowMemory(k=10, return_messages=True)

# Add conversation
memory.save_context(
    {"input": "What's list comprehension in Python?"},
    {"output": "List comprehension is a concise way to create a new list based on an existing one."}
)

# Load memory (inject into LLM)
history = memory.load_memory_variables({})

Characteristics: maintains session context. But once it exceeds k entries, older ones are dropped. Cannot persist long-term.

3. Long-term Memory → Vector Store + External DB

Long-term memory stores information over time and retrieves it when needed. Vector stores play this role.

from langchain.memory import VectorStoreRetrieverMemory
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

# Create long-term memory store
embedding = OpenAIEmbeddings()
vectorstore = FAISS.from_texts(["placeholder"], embedding)
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

memory = VectorStoreRetrieverMemory(retriever=retriever)

# Save to memory
memory.save_context(
    {"input": "My favorite programming language is Python"},
    {"output": "Got it! You love Python."}
)

memory.save_context(
    {"input": "I live in Seoul and work at a startup"},
    {"output": "A startup in Seoul — noted!"}
)

# Later: retrieve relevant memories
relevant = memory.load_memory_variables({"prompt": "Recommend some tech for me"})
# Returns: memories about Python and Seoul startup

Key advantage: can pull only relevant memories from millions of stored entries. Retrieval based on semantic similarity.

4. Episodic Memory → Structured Event Log

Episodic memory records specific events and experiences. Time and context are included.

import json
from datetime import datetime

class EpisodicMemory:
    def __init__(self, db_connection):
        self.db = db_connection

    def record_episode(self, user_id: str, episode: dict):
        """Record a specific event or interaction"""
        self.db.insert("episodes", {
            "user_id": user_id,
            "timestamp": datetime.now().isoformat(),
            "event_type": episode["type"],  # "purchase", "complaint", "success"
            "summary": episode["summary"],
            "metadata": json.dumps(episode.get("metadata", {}))
        })

    def retrieve_episodes(
        self,
        user_id: str,
        event_type: str = None,
        limit: int = 10
    ):
        """Retrieve a user's past episodes"""
        if event_type:
            query = """
                SELECT * FROM episodes
                WHERE user_id = ? AND event_type = ?
                ORDER BY timestamp DESC LIMIT ?
            """
            return self.db.query(query, [user_id, event_type, limit])
        else:
            query = """
                SELECT * FROM episodes
                WHERE user_id = ?
                ORDER BY timestamp DESC LIMIT ?
            """
            return self.db.query(query, [user_id, limit])

# Example usage
memory = EpisodicMemory(db)
memory.record_episode("user_123", {
    "type": "purchase",
    "summary": "User purchased advanced Python course",
    "metadata": {"course_id": "py-advanced", "price": 89}
})

Production Memory Architecture

Enough theory — let's see how to combine these in practice.

User Message
     |
     v
+--------------------------------------------------+
| Memory Router                                    |
| (gather relevant memories, prioritize)           |
+------------+----------+-----------+--------------+
             |          |           |
             v          v           v           v
         Sensory    Short-term   Long-term   Episodic
         (context)  (buffer)     (vector)    (event log)
             |          |           |           |
             +----------+-----------+-----------+
                                |
                                v
                 Consolidated Context
                 (select most relevant memories)
                                |
                                v
                         LLM Generation
                                |
                                v
                         Memory Update
                    (store new info to right memory type)

In short:

  1. When a user message arrives, the Memory Router pulls relevant content from each memory type
  2. The most relevant memories are selected and assembled as context
  3. The LLM generates a response using that context
  4. New information gets stored in the appropriate memory type

In actual code:

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.memory import (
    ConversationBufferWindowMemory,
    VectorStoreRetrieverMemory,
    CombinedMemory
)
from langchain.prompts import PromptTemplate
from langchain.chains import ConversationChain

# Configure memory
short_term = ConversationBufferWindowMemory(
    k=5,
    memory_key="chat_history",
    return_messages=True
)

embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_texts(["placeholder"], embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
long_term = VectorStoreRetrieverMemory(
    retriever=retriever,
    memory_key="relevant_history"
)

# Combine both memories
combined_memory = CombinedMemory(memories=[short_term, long_term])

# Set up prompt
template = """
You are a personal AI assistant who remembers the user well.

Past relevant memories:
{relevant_history}

Recent conversation:
{chat_history}

User: {input}
Assistant:"""

prompt = PromptTemplate(
    input_variables=["relevant_history", "chat_history", "input"],
    template=template
)

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

chain = ConversationChain(
    llm=llm,
    memory=combined_memory,
    prompt=prompt,
    verbose=True
)

# Conversation
chain.predict(input="Hi! I'm a backend developer.")
chain.predict(input="I've been learning Rust lately.")

# Even in a new session, long-term memory is available
chain.predict(input="How do I build a web server with the language I'm learning?")
# Can respond with "In Rust..."

Mem0: The Modern Memory Framework

There's a more convenient option than building this yourself. Mem0 is a modern open-source framework that provides a memory layer for AI applications.

from mem0 import Memory

m = Memory()

# Add memories (LLM automatically extracts and stores important info)
result = m.add(
    messages=[
        {"role": "user", "content": "I'm vegetarian, Korean, and live in Seoul. I like Python."},
        {"role": "assistant", "content": "Got it! I'll remember that."}
    ],
    user_id="user_123"
)

# Check what was stored
print(result)
# [
#   {"memory": "User is vegetarian", "id": "..."},
#   {"memory": "User is Korean and lives in Seoul", "id": "..."},
#   {"memory": "User prefers Python", "id": "..."}
# ]

# Search memories (based on semantic similarity)
memories = m.search("recommend me some food", user_id="user_123")
# Returns: vegetarian, Seoul-related memories

# Use in agent
context = "\n".join([item["memory"] for item in memories["results"]])
response = llm.invoke(
    f"User context:\n{context}\n\nRequest: recommend me some food"
)

Mem0 advantages:

  • LLM automatically extracts and stores important info from conversations
  • Automatic handling of duplicate and conflicting memories
  • Memory separated by user, agent, and session
  • Both REST API and Python SDK available

The Hard Problems in Memory Systems

Now that we know how to implement, let's look at the genuinely difficult parts.

Problem 1: Memory Conflicts

# 3 years ago: "I live in Seoul"
# Today: "I moved to Busan last year"

# How do you handle this?
# Option A: overwrite with latest information
# Option B: keep both with timestamps
# Option C: ask user to confirm

Mem0 uses an LLM to detect conflicts and automatically update. But it's not 100% perfect.

Problem 2: Privacy

How much should be remembered? This is both a technical and ethical question.

  • Users must be able to request memory deletion (GDPR)
  • Sensitive info (medical, financial) needs special handling
  • Data from different users must never mix
# Memory deletion for GDPR compliance
def delete_user_memory(user_id: str):
    m.delete_all(user_id=user_id)
    print(f"All memories for user {user_id} deleted")

Problem 3: Appropriate Forgetting — Ebbinghaus's Forgetting Curve

Psychologist Ebbinghaus studied the pattern of human forgetting over time. This can be applied to AI memory as well.

import math
from datetime import datetime

def calculate_memory_importance(memory: dict) -> float:
    """
    Calculate the current importance of a memory.
    Importance decreases over time, but memories
    accessed frequently maintain their importance.
    """
    days_since_last_access = (
        datetime.now() - memory["last_accessed"]
    ).days
    access_count = memory["access_count"]

    # Based on Ebbinghaus forgetting curve
    # Base forgetting: decreases over time
    base_retention = math.exp(-days_since_last_access / 30)

    # Reinforcement from access frequency
    # (frequently referenced = harder to forget)
    reinforcement = math.log(1 + access_count) * 0.3

    return min(1.0, base_retention + reinforcement)

# Periodically clean up low-importance memories
def prune_memories(user_id: str, threshold: float = 0.1):
    memories = m.get_all(user_id=user_id)
    for memory in memories:
        if calculate_memory_importance(memory) < threshold:
            m.delete(memory["id"])

Problem 4: Scalability

How do you manage memory for millions of users?

  • Per-user independent vector index vs shared index
  • Memory compression (increase info density through summarization)
  • Hot/cold storage separation (frequently accessed vs. old memories)
  • Sharding and distributed processing

Best practices here aren't fully established yet. In production, it's common to combine with infrastructure like Redis, Pinecone, or Weaviate.


Which Memory Strategy Should You Choose?

A simple decision guide.

Use CaseRecommended Strategy
Simple chatbotConversationBufferWindowMemory (recent k)
Personalized assistantShort-term + long-term (vector) combined
Customer serviceEpisodic + long-term memory
Coding assistantProject codebase context + short-term
Production AI appMem0 or custom + privacy layer

Wrapping Up

Memory is the key element that makes AI agents genuinely useful. Eliminating the frustration of "I have to explain this from scratch again?" is the next step for AI assistants.

If you're building an AI project right now, incorporate the memory system into your design from the start. Adding it later is much harder.

Start with a framework like Mem0. It's a great way to learn what problems have already been solved before building your own.

An AI with memory isn't just a tool — it can be a real collaborator. The difference is larger than you might expect.