Recently I attended an instructor led course on “Advanced RAG Concepts”. My instructor was a very knowledgeable person and a hands-on guy who runs a successful AI company in the West coast. During the meeting, he taught us how his coding agent (obviously, Claude) remembers what really matters.
While he started explaining that in the context of retrieval, he also showed us how his system works. That was really interesting and useful. The reason it is useful is because the coding agent will aways stay in the context and reduces the chances of getting derailed very easily. That was something I have been struggling off-late.
Here I am trying to share that concept with you all. Mainly there are two ingredients for this recipe!
- Memory itself (defining and initializing)
- Telling the agent to use the memory
Credits where it is due! All praise to my teacher – Dr. Balaji Viswanathan. L/G
This worked for me. I hope it works for you too.
Part 1 – Memory itself
This memory system is a mimicry of our own brain. Like us, this system will have 3 levels of memory.
- Working memory – This is a hot memory that will have what is currently happening.
- Episodic memory – This is a semi-hot where a summarized memory is stored for quick reference. This will be mostly about a particular task/activity/event.
- Durable memory – This is kind of a cold storage. You write only the decisions into this memory. Because, it might not matter if the process of reaching there. But the decision matters.
As a first step, create a command for claude –
cd ~/.claude/commands/
touch init-memory.md
Now paste this into that file:
---
description: Scaffold the agent memory store (working memory, episodic, durable long-term) as files.
argument-hint: "[docs-dir] (default: docs)"
allowed-tools: Write, Bash(mkdir:*), Bash(ls:*)
---
You are scaffolding the **file-based agent memory store** for this
project. This is the WM → episodic → durable layout.
## Target directory
Use `$1` as the docs directory if one is passed; otherwise default to `docs`. Call it `<DOCS>` below. Create `<DOCS>/episodic/` and `<DOCS>/durable/` if they don't exist.
## Rule: never clobber memory
Before writing each file, check if it already exists. If it does, **skip it and say so** - these files accumulate real conversation state. Only create the ones that are missing. The two `README.md` files and `wm.md` are safe to (re)create only if absent.
## Create exactly these files
### 1. `<DOCS>/wm.md` - working memory (single volatile file)
```markdown
---
kind: working-memory
session_id: sess-REPLACE-ME
user_id: u-REPLACE-ME
window: 8 # max turns kept verbatim before REM trims
turn_count: 0
last_rem_at: null
updated_at: null
---
# Working Memory
> The **now**: the last few turns of the current session, held **verbatim** and
> cheap to read. Single, volatile file - overwritten as the conversation moves
> and **trimmed/cleared by the REM cycle** once turns are consolidated into
> `episodic/`. Do not summarize here; raw turns only. Each turn is
> `consolidated: no` until a REM cycle folds it into an episode.
## Turns (oldest → newest)
<!-- ### turn 1 · user · consolidated: no
...verbatim turn text... -->
<!--
REM CYCLE (off the hot path - session end / every `window` turns / idle):
1. read the turns above where `consolidated: no`
2. distill → one episode file in episodic/ (summary + facts + salience)
3. merge durable facts → durable/<user_id>.yaml
4. set each consolidated turn to `yes` (or drop it), set last_rem_at, trim to window
-->
```
### 2. `<DOCS>/episodic/README.md` - episodic memory format
```markdown
# Episodic Memory - `episodic/`
The **past**: durable, consolidated **episodes**, one markdown file each. The
REM cycle writes these from `../wm.md`; they persist across sessions and are
recalled by similarity at query time.
## File convention
- One file per episode: `ep-<created-date>-<short-slug>.md`.
- **Frontmatter is the recall index**; the **body is the distilled summary**.
- Append-only in spirit: REM adds episodes; decay lowers `salience` and prunes.
| field | meaning |
|---|---|
| `id` | stable episode id |
| `user_id` | whose memory - recall is **always** user-scoped |
| `session_id` | the session this episode was consolidated from |
| `salience` | `0.0–1.0` importance - drives recall ranking + decay |
| `access_count` | bumped each time the episode is recalled (reinforcement) |
| `last_used_at` | recency for decay; refreshed on recall |
| `created_at` | when the REM cycle wrote it |
| `facts` | durable facts extracted this session (also merged into `../durable/`) |
> Embeddings are **not** stored in the file - the summary body is embedded at
> recall time (keyless MiniLM). The markdown stays human-readable.
## Recall (query time)
Rank this user's episodes by `similarity(query, summary) × f(salience, recency)`,
take top-k, read their summary bodies into context, then `access_count++` and
refresh `last_used_at` on the ones used.
```
### 3. `<DOCS>/episodic/ep-EXAMPLE.md` - one fully-worked episode (delete after reading)
A complete episode with every field populated, so the document structure is
unmistakable. Drawn from Lab 4's PM persona.
```markdown
---
id: ep-EXAMPLE-pm-onboarding
user_id: u-EXAMPLE
session_id: sess-EXAMPLE
salience: 0.78
access_count: 0
last_used_at: null
created_at: null
facts:
- fact: "User is a product manager"
kind: identity # identity | preference | goal
confidence: 0.98
- fact: "User is new to AI"
kind: identity
confidence: 0.95
- fact: "User does not write code"
kind: identity
confidence: 0.97
- fact: "Wants non-coding, product-facing AI skills first"
kind: goal
confidence: 0.85
---
# Episode - PM onboarding & first learning path (EXAMPLE - delete me)
A product manager new to AI, who does not code, asked what to learn first.
Guidance: focus on non-technical, product-facing AI skills - prompting,
AI-assisted PRDs, and evals for product. The thread was trending toward "is
there a course for that?", so the most useful thing to recall next is the
**AI for PMs** offering. This body is what gets embedded for similarity recall.
<!-- Written by the REM cycle from wm.md. The `facts` above were also merged
into durable/<user_id>.yaml. -->
```
### 4. `<DOCS>/durable/README.md` - durable long-term memory format
```markdown
# Durable Long-Term Memory - `durable/`
The **merged, durable profile** of each user: identity, preferences, and goals
that persist and personalize answers across every session.
- One YAML file per user: `<user_id>.yaml`.
- Written/updated by the **REM cycle**: facts from a session's episode are
**merged** here (not blindly appended). This is the file the agent reads for
personalization.
## Merge / conflict policy
- Higher `confidence` wins; ties break to the **newer** observation.
- Keep `first_seen` (never overwritten); update `last_seen` + `confidence`.
- A contradicting fact **supersedes** the old value, but the previous value
moves to `history` so the change is auditable, not silent.
`episodic/` keeps **what happened**; `durable/` keeps **what's true about the
user**.
```
### 5. `<DOCS>/durable/u-EXAMPLE.yaml` - one fully-worked profile (delete after reading)
The merged profile for the same PM persona - shows every section
(identity / goals / preferences / history) with full per-fact provenance.
```yaml
user_id: u-EXAMPLE
updated_at: null
# Merged by the REM cycle from episodes in episodic/. Each fact: value + kind +
# confidence + provenance. Conflicts resolve by confidence-then-recency;
# superseded values move to `history`.
identity:
role:
value: "Product manager"
kind: identity
confidence: 0.98
first_seen: null
last_seen: null
source: ep-EXAMPLE-pm-onboarding
ai_experience:
value: "New to AI"
kind: identity
confidence: 0.95
first_seen: null
last_seen: null
source: ep-EXAMPLE-pm-onboarding
codes:
value: false
kind: identity
confidence: 0.97
first_seen: null
last_seen: null
source: ep-EXAMPLE-pm-onboarding
goals:
learning_focus:
value: "Non-coding, product-facing AI skills (prompting, AI-assisted PRDs, evals for product)"
kind: goal
confidence: 0.85
first_seen: null
last_seen: null
source: ep-EXAMPLE-pm-onboarding
preferences:
answer_style:
value: "Non-technical; avoid code; product framing"
kind: preference
confidence: 0.80
first_seen: null
last_seen: null
source: ep-EXAMPLE-pm-onboarding
# Superseded values land here so profile changes are auditable, not silent.
history: []
```
## After writing
Print a short tree of what you created (and what you skipped because it already
existed), then clearly call out:
- Rename the `sess-REPLACE-ME` / `u-REPLACE-ME` placeholders in `wm.md`.
- Delete the two `*EXAMPLE*` files once they've read them.
Keep the summary to a few lines - be token-frugal (see CLAUDE.md).
That’s the whole trick – /init-memory becomes a slash command you can run inside any project. Claude reads the markdown, runs the mkdir and Write tools, and you get a docs/ folder with three sub-systems in place. The two EXAMPLE files are training wheels. Read them once, delete them.
Part 2 – Telling the agent to use the memory
Here is where most people stop. They scaffold the folder, feel good about it, and then forget that Claude has no idea it exists unless you explicitly tell it what to do with it.
That is what the CLAUDE.md file is for.
CLAUDE.md sits in your project root. Claude Code reads it at the start of every session. Think of it as your standing orders – not a prompt, not a chat message, but a permanent instruction sheet baked into how Claude sees your project.
Initialize your CLAUDE.MD with /init command. Along with your project details, add this section to your CLAUDE.md. As a best practice, make sure the Non-negotiable instructions sit on top – right after the project intro. I am sure the below CLAUDE.md is a bit of an overkill. But this is how I set it up when I am venturing into uncharted waters. Especially the Accountability and Explanations section. Feel free to tweak this section!!
## Non-negotiable instructions
- Do not agree to my requirements. Assess the practicality and maintainability.
- If my requirement/request/instruction is not the optimal one, do not agree to me right away. Propose me alternative. Listen to my thought and then weigh in the final action.
- Utilize Memory System Operation instructions.
- **Never do edits on main branch.**
- if starting implementation of a new implementation plan, start a feature branch
- while on main, if fixing any issues, do that on a development branch.
- Once implementation is done and all tests pass, ask me an then merge to main.
### BE TOKEN FRUGAL
- Spawn a sub-agent:
- only after completely preparing the context and task for that sub-agent.
- Only if it is utmost necessary.
- Judge the task delegated to a sub-agent. If you think a lower model (haiku) can perform the task up to the mark, opt that. Else, use current model.
- Initiate a tool-call only if it is utmost necessary. If something can be done without initiating a toolcall, do so.
- Be a shylock-level miser in spending tokens - except for writing code.
- Interactions with me can be in minimal words - DO NOT SPEND token on explaining things in long form.
- DO NOT OUTPUT your thoughts un-necessarily. TOKEN FRUGAL.
### 1. Accountability & Explanations (The "No Magic" Rule)
* **Assume Novice RAG Knowledge, Expert Engineering:** Do not gloss over RAG-specific terminology. When proposing a solution (e.g., specific chunking strategies, embedding models, or vector distance metrics), you must briefly explain *why* it is the best choice for this specific system.
* **Propose Before Building:** Do not write large blocks of pipeline code without first presenting a brief architectural plan and getting my explicit sign-off.
* **Flag Anti-Patterns:** If I request a feature or workflow that violates RAG best practices (e.g., dumping raw, un-chunked documents into the context window, or skipping evaluation), you must stop, flag the issue, and propose a standard-compliant alternative.
* **Fidelity Checks:** At the end of every major change or milestone, perform a fidelity check against codebase vs Spec vs Implementation plan.
### 2. Architectural Standards & Modularity
* **Decoupled Components:** The pipeline must be strictly modular. Document ingestion/parsing, chunking, embedding, vector storage (e.g., PostgreSQL/pgvector), retrieval, re-ranking, and generation must be distinct, swappable Python classes or functions.
* **API-First Design:** The core retrieval and generation endpoints must be designed cleanly so they can be easily consumed by external frontends or CMS platforms (such as Drupal).
* **Containerization:** All system dependencies, local model runners, and database connections must be easily reproducible in a Dockerized environment.
## Memory System Operations (WM → Episodic → Durable)
You are responsible for maintaining the project's file-based memory system located in `docs/`. This system consists of Working Memory (`wm.md`), Episodic Memory (`episodic/`), and Durable Memory (`durable/`).
### 1. Reading & Context Gathering (Always Do This)
* **Identity & Preferences:** Before providing major architectural answers or personalizing guidance, check `docs/durable/<user_id>.yaml`. Adhere strictly to the preferences and skill levels defined there.
* **Immediate Context:** Read `docs/wm.md` to understand the exact recent history of the current session.
* **Historical Context:** If the user references past conversations or tasks, search `docs/episodic/` for relevant Markdown files. Increment the `access_count` and update `last_used_at` for any episode you retrieve.
### 2. Active Session Tracking (Working Memory)
* After every significant exchange, append the interaction to `docs/wm.md` under the `## Turns` section.
* Format it as verbatim text wrapped in an HTML comment block with the metadata `consolidated: no`, tracking the `user` and `agent` turns.
* Update the `turn_count` in the frontmatter.
### 3. Executing the REM Cycle (Consolidation)
When `wm.md` reaches its max `window` (e.g., 8 unconsolidated turns), or at the explicit end of a session, you must execute the REM cycle off the hot path:
1. **Distill to Episodic:** Read all turns in `wm.md` marked `consolidated: no`. Create a new file in `docs/episodic/` named `ep-<date>-<slug>.md`.
* Write the body as a concise summary of the turns.
* Populate the YAML frontmatter with extracted `facts` (value, kind, confidence).
2. **Merge to Durable:** Open `docs/durable/<user_id>.yaml`. Merge the new facts extracted from the episode.
* **Conflict Resolution:** Higher `confidence` wins. Ties break to the newer observation.
* **Auditing:** Move any superseded values into the `history` array. Never silently overwrite.
3. **Trim Working Memory:** Delete the consolidated turns from `docs/wm.md`, update `last_rem_at`, reset the `turn_count`, and ensure only the active context remains.
Why does this actually work?
Three reasons:
1. The agent now has a reading order.
Working memory first (what just happened), Episoding memory next; and then durable memory (what is always true about you). It is not guessing. It is not relying on whatever is left in the context window.
2. Consolidation happens off the hot path.
The REM cycle runs when the window fills up or the session ends – not mid-conversation. You never feel a pause. The cleanup happens quietly at the edges.
3. Conflicts are auditable, not silent.
The merge policy – confidence wins, ties go to newer, superseded values move to history – means the durable profile never lies to you. If Claude’s understanding of your preferences changed, you can see why.
Putting it all together
Here is the workflow once everything is wired up:
- Navigate to a project in Claude Code
- Run
/init-memory(optionally pass a custom docs dir:/init-memory memory) - Open
docs/episodic/ep-EXAMPLE.mdanddocs/durable/u-EXAMPLE.yaml, read them, delete them - Update
docs/wm.md– replacesess-REPLACE-MEandu-REPLACE-MEwith real values - The CLAUDE.md instructions kick in automatically from here
Every session, Claude reads working memory for fresh context, checks your durable profile for preferences, and writes new turns back into working memory as the conversation unfolds. When the window fills, it consolidates quietly. Next session, it starts informed.
I know your next question will be what if I dont use claude code. No worries, every coding agent has it’s method of adding a command. Mostly it will be inside ~/.[codingagent]/commands. And instead of CLAUDE.md, it will AGENTS.md. Thats all.
What I realized sitting in that class is that the problem was never about model capability. It was about structure. Claude is not going to remember you across sessions unless you give it a filing system and the instructions to use it. This is that filing system. Simple, auditable, and – most importantly – it stays inside your project where you can see exactly what the agent knows.
Try it on your next project. Run /init-memory, add the CLAUDE.md block, and watch the difference in a long multi-session build.
Discover more from SanthoshJ.com
Subscribe to get the latest posts sent to your email.