How I Tightened OpenClaw Memory So Long Sessions Stop Falling Apart

My OpenClaw memory setup wasn’t starting from zero. I already had daily logs for raw continuity, atomic knowledge cards searched with local Ollama embeddings, a slim MEMORY.md index, compaction in safeguard mode, and context pruning turned on. The foundation worked well for basic session continuity.

But after reviewing how other people were tuning long-running agent setups, I found weak points in my own config. None of them looked catastrophic alone. Together, they explained a pattern I’d been seeing in longer conversations: the agent stayed functional while gradually losing the thread of what actually mattered in the current session.

That’s the subtle failure mode with memory systems. The agent doesn’t break. It just gets slightly worse, then slightly more generic, then slightly less grounded in what just happened.

So I tightened the setup. Here’s what was already working, what I fixed, and why each change matters.

The Memory Architecture I Was Already Using

Four layers working together:

  1. Daily logs (memory/YYYY-MM-DD.md) for raw session continuity
  2. Knowledge cards (memory/cards/*.md) for durable facts and lessons
  3. A slim MEMORY.md (~2KB) as a curated index, not a data dump
  4. Semantic search over the cards using qwen3-embedding:8b on Ollama (local, free)

The key design choice: not everything belongs in one giant memory file. Daily logs are cheap and messy on purpose. Knowledge cards are the opposite: small, focused, worth retrieving later. MEMORY.md stays short and acts as a map.

A knowledge card looks like this:

---
topic: OpenClaw memory card pattern
category: system
tags: [openclaw, memory, retrieval]
created: 2026-03-23
---

- Store one durable idea per card (~350 tokens)
- Keep wording concrete so embeddings match well
- Link to related cards instead of creating catch-all files
- Promote only information that's likely to matter again

I also had compaction safeguards, memory flush, and context pruning with cache-ttl mode already enabled. Solid baseline. The gaps were in what happened when sessions got long.

What I Changed (Five Fixes)

1. Explicit Compaction Tuning

I was relying too much on defaults. Defaults are fine until a session gets dense with planning, code, tool output, and memory events all in the same transcript.

{
  "keepRecentTokens": 20000,
  "recentTurnsPreserve": 4,
  "maxHistoryShare": 0.7,
  "reserveTokens": 30000
}

keepRecentTokens: 20000 protects the active working set from being summarized too early. recentTurnsPreserve: 4 keeps the last four exchanges verbatim so the agent retains immediate intent, not just topic. maxHistoryShare: 0.7 prevents old transcript material from crowding out the present. reserveTokens: 30000 leaves room for the model to actually think and act after compaction, instead of arriving at the next turn already cramped.

The practical effect: recent work survives. Sounds obvious, but if recent work gets summarized away, everything else in the memory stack becomes less useful.

2. Custom Memory Flush Prompt

OpenClaw can flush memory, but generic summaries aren’t enough for a structured system. I don’t want the agent rewriting the conversation into prose. I want it to sort information into the right storage tier.

{
  "softThresholdTokens": 32000,
  "forceFlushTranscriptBytes": "2mb",
  "prompt": "Write terse bullet points only. Save routine continuity to the daily log. Promote significant reusable items to memory/cards/. Prefer atomic notes. No conversation rewrites."
}

The softThresholdTokens: 32000 setting means the agent starts externalizing important state before compaction pressure gets severe. Memory flush becomes routine maintenance instead of emergency salvage.

3. Transcript Size Safety Net

Token thresholds are useful, but transcripts can grow in surprising ways. Tool-heavy sessions are the classic example. The forceFlushTranscriptBytes: "2mb" setting catches sessions that are simply too large at the transcript layer, even if token-based heuristics didn’t fire. Think of it as a seatbelt, not the steering wheel.

4. QMD Session Transcript Search (The Biggest Fix)

This was the biggest gap in my setup. I had QMD installed but session transcript search was disabled. The agent could search memory files and knowledge cards, but not raw conversation history.

Huge blind spot.

Not every useful fact gets promoted into a memory card. Not every recent decision belongs in MEMORY.md. Sometimes the thing the agent needs is buried in actual session history: a constraint, a one-off decision, a naming choice, a half-finished plan.

{
  "sessionTranscriptSearch": {
    "enabled": true,
    "retentionDays": 90
  }
}

This gives the agent access to raw recall, not just curated recall. Curated memory is high quality but lossy by design. Transcript search restores detail when detail matters. For ongoing project work, debugging, and workflows where exact earlier wording matters, this is huge.

If you only do one upgrade from this post, do this one.

5. Post-Compaction Section Reinjection

Long sessions don’t just lose facts. They can also lose behavior.

After enough compaction, the agent still responds but becomes blander, less consistent, less anchored to its operating rules. In my setup, those rules live in AGENTS.md. So I configured OpenClaw to reinject key sections after compaction:

{
  "postCompactionSections": [
    "Every Session",
    "Memory",
    "Safety",
    "Group Chats",
    "Multi-Agent Workflow"
  ]
}

This keeps memory habits, safety boundaries, and communication style intact across long sessions. Without it, personality drift is real.

The System Now

The setup behaves like a layered memory pipeline:

  • Recent context is protected by explicit compaction tuning
  • Important facts get flushed before the transcript gets dangerous
  • Durable knowledge lives in atomic cards
  • Raw recall remains searchable through QMD transcripts
  • Behavioral identity gets rehydrated after compaction

No single mechanism is enough. Knowledge cards are great for retrieval. Daily logs handle continuity. Transcript search provides fidelity. Compaction tuning keeps the present alive. Post-compaction reinjection keeps the agent recognizable. Together, the system feels much less fragile.

Actionable Takeaways

  1. Don’t rely on defaults for compaction. Protect recent work explicitly.
  2. Treat memory flush as routing, not summarization. Decide what goes to logs, what becomes a card, and what should stay out.
  3. Add a transcript size safety net. Token logic alone isn’t enough.
  4. Enable transcript search. Curated memory without raw recall leaves a major hole.
  5. Reinject behavioral sections after compaction. Memory isn’t just facts. It’s also operating rules and style.
  6. Keep MEMORY.md small. Use it as an index, not a landfill.
  7. Prefer atomic knowledge cards. Small files retrieve better and age better.

My setup was already decent before these changes. But these updates made it much more resilient in exactly the situations where agent memory usually degrades: long sessions, mixed workloads, and real projects where continuity has to survive friction.