The Problem
AI agents wake up fresh every session. To maintain continuity, they load memory files — but monolithic memory files waste context window on irrelevant information. An agent helping with product colors doesn't need the email pipeline. An agent debugging infrastructure doesn't need product pricing.
Context window is the agent's working memory. Every irrelevant token loaded is a token not available for reasoning. Overloaded context doesn't just waste capacity — it degrades performance. The agent's attention spreads thin across information it doesn't need, reducing the quality of its responses on the thing it does need.
This is the AI equivalent of trying to solve a math problem while someone reads you a novel.
The Solution
Domain Fragment Context Loading organizes agent memory into small, focused files (fragments) connected by cross-references and backed by semantic vector search. It mirrors human associative memory: you don't replay your entire life to answer a question — you recall what's relevant and follow mental links to related concepts.
Two Systems, Like Human Cognition
The human brain has (at least) two relevant memory systems:
- ●Long-term memory — stored knowledge, experiences, and lessons learned over time
- ●Fluid intelligence — the ability to pull the right knowledge at the right time, make novel connections, and reason with limited working memory
DFCL maps both onto an AI agent's file system:
| Cognitive System | DFCL Implementation | What It Does |
|---|---|---|
| Long-term memory | Domain fragments focused .md files, <200 lines | Stores everything the agent has learned, organized by domain |
| Fluid intelligence | Semantic vector search + cross-references | Surfaces the right fragment at the right time |
Long-term memory without fluid intelligence is a filing cabinet you can't navigate. Fluid intelligence without long-term memory is cleverness with no foundation. You need both.
Architecture
workspace/
├── MEMORY.md ← Index (slim, <50 lines)
│ ├── Fragment Map ← domain → file → key info
│ ├── Boot Sequence ← what to load on startup
│ └── Tooling notes
│
├── memory/
│ ├── fragments/ ← Domain knowledge (curated, stable)
│ │ ├── user-profile.md
│ │ ├── product-a.md
│ │ ├── product-b.md
│ │ ├── infrastructure.md
│ │ ├── e-commerce.md
│ │ ├── marketing.md
│ │ ├── lessons.md
│ │ └── ...
│ │
│ └── YYYY-MM-DD.md ← Daily notes (raw, chronological)
│ ├── Session logs
│ ├── Decisions made
│ └── Context that may graduate to fragments
│
└── SOUL.md / USER.md / etc. ← Identity & config (loaded at boot)Fragment Rules
Cross-References: The Knowledge Graph
Fragments link to each other the way concepts link in your mind:
# Rack Adapter Blocks
**Parent:** [ironclan.md](ironclan.md)
**Buy Flow:** [shopify.md](shopify.md)
**Filament Details:** See Obsidian `3D Workshop/Filament Order.md`This lets the agent walk the graph: start at one domain, follow links to pull in related context only when the task requires it. No link followed = no tokens spent.
Runtime Behavior
Boot Sequence
Every session, fixed cost:
SOUL.md→ Agent identity — who am I?USER.md→ Human profile — who am I helping?Daily notes→ Today + yesterday — what just happened?MEMORY.md→ The index only — what exists?The agent knows what knowledge exists without loading all of it. Typical boot cost: ~300-500 lines, regardless of total memory size.
On-Demand Recall
When a question or task arrives:
Memory Maintenance
| Cadence | Action |
|---|---|
| Every session | Log decisions and outcomes to daily notes |
| Every few days | Review daily notes → graduate durable info to fragments |
| On contradiction | Update immediately — stale fragments are worse than none |
| On growth | Past 200 lines? Split by subdomain |
The Semantic Search Layer
Vector search is what turns organized files into fluid intelligence. Without it, the agent has to know which file to read. With it, the agent describes what it needs and the right context surfaces automatically.
How It Works
Reference Stack
| Component | Tool | Notes |
|---|---|---|
| Embedding model | nomic-embed-text | Local via Ollama, no API calls |
| Search mode | Hybrid | Semantic + keyword (BM25) |
| Vector storage | SQLite | Built-in, no external deps |
| Indexed content | memory/*.md | Auto-indexed on write |
The entire search pipeline runs locally. No data leaves the machine. This matters for agents with access to personal information.
Anti-Patterns
One giant memory file
Wastes context, degrades reasoning quality
Split by domain into fragments
A fragment per task
Over-fragmentation, index becomes noise
Keep it domain-level, not task-level
Load all fragments at boot
Defeats the entire purpose
Load index only, pull on demand
Skip semantic search
Works at 5 files, breaks at 50
Always search first, read second
Duplicate info across fragments
Conflicting sources of truth
Single source + cross-references
Scaling
Rule of thumb: if you can't scan the fragment map in MEMORY.md in 5 seconds, it's too big. Restructure.
Comparison to Other Approaches
| Approach | Pros | Cons |
|---|---|---|
| Monolithic memory file | Simple, one file | Wastes context, degrades with scale |
| Database / structured | Queryable, scalable | Loses narrative context |
| RAG over documents | Scales well | Chunk boundaries lose coherence |
| DFCL | Human-readable, agent-editable, associative | Needs maintenance discipline + vector search |
DFCL's advantage is that the agent can both read and write its own memory in a format that's also human-auditable. The files are just markdown. You can open them, edit them, review them. No opaque database, no embeddings-only storage.
Why This Matters
Every AI agent session is a fresh start. The quality of that session depends entirely on:
DFCL optimizes both. The result: an agent that accumulates knowledge over time without accumulating bloat. It gets smarter without getting slower.
DFCL was developed while building a hardware business with an AI agent running Claude via OpenClaw. The architecture emerged from practical problems — context windows filling with irrelevant product details, stale information overriding recent decisions, and the general entropy of a single growing memory file. The name came later. The frustration came first.