The Memory Problem | OpenClaw Series

Most agent frameworks default to markdown for persistent state. Claude Code uses CLAUDE.md. OpenClaw uses MEMORY.md. The pattern makes sense: LLMs can read files and humans can edit them directly.

Markdown is a poor fit for operational state.

The symptom

The task list lived in HEARTBEAT.md. Bullet points for what needed doing.

Using a markdown file as a data store means the agent wakes up each session and has to piece together current state from whatever text it finds — stale bullets mixed with active ones, no timestamps, no way to count open items without reading the whole file, sections that could quietly contradict each other. It is inferring state from context clues rather than reading state from a source of truth.

Actions and operational state belong in structured data. A database has rows, timestamps, and a status field. A markdown file has vibes.

What the industry does

Most frameworks treat memory as a retrieval problem. The default pattern is to embed everything into a vector database and retrieve similar chunks at query time. LangChain, LlamaIndex, and CrewAI all support this. Mem0 adds a graph store and a key-value store on top of the vector layer. Zep runs PostgreSQL, a graph database, and a search engine. CrewAI's default memory calls an LLM on every write to infer storage scope.

The latency numbers show the cost. Simple memory systems run at around 1 second. MemoryOS, one of the more complete hierarchical systems, runs at 32 seconds. Some systems require millions of tokens just to initialize.

The three-tier split below uses none of that. Each tier is a tool with known semantics that predates AI agents. Markdown is a text file. SQLite ships with Python. The CRM is already the source of truth for the business. Nothing new to operate.

Three types of persistent state

Identity and instructions change slowly. Who the agent is, how it does prospecting, operating rules. Read once at session start. Markdown is the right format.

Operational state changes constantly. What needs doing, what happened, system health. Has structure, timestamps, relationships. Needs to be queried, not read. A database is correct.

Business data lives in the CRM. Contacts, notes, estimates. Source of truth for the pipeline.

Three tiers. No overlap.

Before and after

The old heartbeat path used a markdown file as the task list.

Before (HEARTBEAT.md):

1. Read MEMORY.md
2. Read HEARTBEAT.md (scan bullets, guess what's current)
3. Read recent memory logs
4. Infer what to work on

That workflow depended on someone keeping HEARTBEAT.md current by hand. The file could mix active work, stale notes, and partially completed items in the same list, and the agent had no reliable way to tell which bullets still mattered.

After (heartbeat + SQLite):

sqlSELECT id, title, priority, skill
FROM actions WHERE status IN ('open', 'doing')
ORDER BY priority ASC;

The heartbeat path still starts from HEARTBEAT.md, but only as an instruction to query SQLite. The task state no longer lives in markdown. Instead of scanning a file and inferring priority, the agent triggers a database query and gets ordered rows back.

What stays in markdown

SOUL.md, USER.md, MEMORY.md: identity, read once per session, rarely changes
memory/YYYY-MM-DD.md: human-readable audit trail

These define who the agent is, not what it's doing.

Tradeoff

This setup gives up fuzzy semantic recall. There's no way to ask "what did we discuss about X last month" without a structured query or loading relevant logs into context. The heavy systems handle that: temporal relationship tracking, automatic importance scoring, contradiction detection across sessions. For a general-purpose assistant working across unrelated domains, those matter.

For a single-purpose field agent running prospecting and CRM operations, the queries are structured and the domain is bounded. The infrastructure cost doesn't pay off here.