← Back to Blog
AIArchitectureMemory Systems

Domain Fragment
Context Loading

A memory architecture for AI agents with limited context windows.

Nicholas WebbMarch 2026

The Problem

AI agents wake up fresh every session. To maintain continuity, they load memory files — but monolithic memory files waste context window on irrelevant information. An agent helping with product colors doesn't need the email pipeline. An agent debugging infrastructure doesn't need product pricing.

Context window is the agent's working memory. Every irrelevant token loaded is a token not available for reasoning. Overloaded context doesn't just waste capacity — it degrades performance. The agent's attention spreads thin across information it doesn't need, reducing the quality of its responses on the thing it does need.

This is the AI equivalent of trying to solve a math problem while someone reads you a novel.

The Solution

Domain Fragment Context Loading organizes agent memory into small, focused files (fragments) connected by cross-references and backed by semantic vector search. It mirrors human associative memory: you don't replay your entire life to answer a question — you recall what's relevant and follow mental links to related concepts.

Two Systems, Like Human Cognition

The human brain has (at least) two relevant memory systems:

  • Long-term memory — stored knowledge, experiences, and lessons learned over time
  • Fluid intelligence — the ability to pull the right knowledge at the right time, make novel connections, and reason with limited working memory

DFCL maps both onto an AI agent's file system:

Cognitive SystemDFCL ImplementationWhat It Does
Long-term memoryDomain fragments
focused .md files, <200 lines
Stores everything the agent has learned, organized by domain
Fluid intelligenceSemantic vector search + cross-referencesSurfaces the right fragment at the right time

Long-term memory without fluid intelligence is a filing cabinet you can't navigate. Fluid intelligence without long-term memory is cleverness with no foundation. You need both.

Architecture

workspace/
├── MEMORY.md                    ← Index (slim, <50 lines)
│   ├── Fragment Map             ← domain → file → key info
│   ├── Boot Sequence            ← what to load on startup
│   └── Tooling notes
│
├── memory/
│   ├── fragments/               ← Domain knowledge (curated, stable)
│   │   ├── user-profile.md
│   │   ├── product-a.md
│   │   ├── product-b.md
│   │   ├── infrastructure.md
│   │   ├── e-commerce.md
│   │   ├── marketing.md
│   │   ├── lessons.md
│   │   └── ...
│   │
│   └── YYYY-MM-DD.md            ← Daily notes (raw, chronological)
│       ├── Session logs
│       ├── Decisions made
│       └── Context that may graduate to fragments
│
└── SOUL.md / USER.md / etc.     ← Identity & config (loaded at boot)

Fragment Rules

01
Sub-200 lines. If it's growing past this, split it. The goal is a single focused read, not a chapter.
02
One domain per fragment. Products, infrastructure, people, processes — not mixed.
03
Cross-linked. Every fragment references related fragments with relative markdown links, forming a navigable knowledge graph.
04
Self-contained. A fragment should make sense on its own. Don't require reading three other files to understand one.
05
Curated, not raw. Fragments are distilled knowledge. Daily notes are raw logs. Information graduates from daily notes to fragments when it proves durable.

Cross-References: The Knowledge Graph

Fragments link to each other the way concepts link in your mind:

# Rack Adapter Blocks
**Parent:** [ironclan.md](ironclan.md)
**Buy Flow:** [shopify.md](shopify.md)
**Filament Details:** See Obsidian `3D Workshop/Filament Order.md`

This lets the agent walk the graph: start at one domain, follow links to pull in related context only when the task requires it. No link followed = no tokens spent.

Runtime Behavior

Boot Sequence

Every session, fixed cost:

1
SOUL.mdAgent identity — who am I?
2
USER.mdHuman profile — who am I helping?
3
Daily notesToday + yesterday — what just happened?
4
MEMORY.mdThe index only — what exists?

The agent knows what knowledge exists without loading all of it. Typical boot cost: ~300-500 lines, regardless of total memory size.

On-Demand Recall

When a question or task arrives:

1.
Semantic search Vector embeddings find relevant fragments by meaning, not keywords. "What color is the gold filament?" hits the product fragment even if the query doesn't mention the file name.
2.
Follow links If a hit references another fragment, load it if the task needs it.
3.
Stop early Load only what's relevant. Two fragments is usually enough. Five is a red flag.

Memory Maintenance

CadenceAction
Every sessionLog decisions and outcomes to daily notes
Every few daysReview daily notes → graduate durable info to fragments
On contradictionUpdate immediately — stale fragments are worse than none
On growthPast 200 lines? Split by subdomain

The Semantic Search Layer

Vector search is what turns organized files into fluid intelligence. Without it, the agent has to know which file to read. With it, the agent describes what it needs and the right context surfaces automatically.

How It Works

1All fragments and daily notes are chunked and embedded into vector space using a local embedding model
2At query time, the agent's question is embedded and compared against stored vectors
3Top matches are returned with file path, line numbers, relevance score, and snippet
4Agent reads the relevant chunk, follows links if needed

Reference Stack

ComponentToolNotes
Embedding modelnomic-embed-textLocal via Ollama, no API calls
Search modeHybridSemantic + keyword (BM25)
Vector storageSQLiteBuilt-in, no external deps
Indexed contentmemory/*.mdAuto-indexed on write

The entire search pipeline runs locally. No data leaves the machine. This matters for agents with access to personal information.

Anti-Patterns

Don't

One giant memory file

Why

Wastes context, degrades reasoning quality

Instead

Split by domain into fragments

Don't

A fragment per task

Why

Over-fragmentation, index becomes noise

Instead

Keep it domain-level, not task-level

Don't

Load all fragments at boot

Why

Defeats the entire purpose

Instead

Load index only, pull on demand

Don't

Skip semantic search

Why

Works at 5 files, breaks at 50

Instead

Always search first, read second

Don't

Duplicate info across fragments

Why

Conflicting sources of truth

Instead

Single source + cross-references

Scaling

< 15Index + manual links work fine
15 – 40Lean more on semantic search, less on index scanning
40+Group into subdirectories, add category-level indexes

Rule of thumb: if you can't scan the fragment map in MEMORY.md in 5 seconds, it's too big. Restructure.

Comparison to Other Approaches

ApproachProsCons
Monolithic memory fileSimple, one fileWastes context, degrades with scale
Database / structuredQueryable, scalableLoses narrative context
RAG over documentsScales wellChunk boundaries lose coherence
DFCLHuman-readable, agent-editable, associativeNeeds maintenance discipline + vector search

DFCL's advantage is that the agent can both read and write its own memory in a format that's also human-auditable. The files are just markdown. You can open them, edit them, review them. No opaque database, no embeddings-only storage.

Why This Matters

Every AI agent session is a fresh start. The quality of that session depends entirely on:

What context is available — long-term memory
How efficiently it can access the right context — fluid intelligence

DFCL optimizes both. The result: an agent that accumulates knowledge over time without accumulating bloat. It gets smarter without getting slower.

DFCL was developed while building a hardware business with an AI agent running Claude via OpenClaw. The architecture emerged from practical problems — context windows filling with irrelevant product details, stale information overriding recent decisions, and the general entropy of a single growing memory file. The name came later. The frustration came first.