Wiring the Harness — WebbCraft

The Idea

In a previous article, I wrote about Domain Fragment Context Loading — a memory architecture that gives AI agents persistent, associative recall without blowing out their context window. That article was about the what and the why.

This one is about the how.

The goal: build a harness that any AI agent can plug into and be productive from its first session. Not just memory — the full stack. Search, skills, version control, remote access, real-world integrations. A workspace, not a chatbot.

Everything here runs locally on one machine. No cloud dependencies for core functionality. No subscriptions beyond the AI model itself.

The Stack

Here's everything the harness is made of. Each section below breaks down the implementation.

Semantic SearchOllama + nomic-embed-text + sqlite-vec

MemoryDFCL fragments — small markdown files, cross-linked, on-demand

Knowledge BaseObsidian vault — project docs, research, specs, all indexed

Version ControlGit-backed everything — memory, skills, configs, rollback-ready

SkillsPluggable integrations — Google Drive, Telegram, website deploys

Remote AccessTelegram daemon — talk to your agent from anywhere, phone or wrist

Semantic Search

This is the nervous system. Without it, the agent has to know exactly which file to read. With it, the agent describes what it needs and the right context surfaces automatically.

The stack is three pieces: Ollama serves a local embedding model, sqlite-vec stores and searches the vectors, and a thin Python script ties them together.

Ollama — Local Embeddings

Ollama runs open-source ML models locally. Install it, pull an embedding model, and you have a local embedding API with zero ongoing cost.

# Install Ollama (macOS)
brew install ollama

# Pull the embedding model
ollama pull nomic-embed-text

# It runs as a background service automatically
# API at http://localhost:11434

# Test it
curl http://localhost:11434/api/embeddings \
  -d '{"model":"nomic-embed-text","prompt":"test query"}'

I use nomic-embed-text — it's small, fast, and good enough for document search. You could swap in any embedding model Ollama supports. The point is: no API keys, no rate limits, no cost per query. It runs on your hardware.

sqlite-vec — Vector Storage

Vectors need somewhere to live and something to search them. Most guides will point you at Pinecone or Weaviate — hosted vector databases that add cost and complexity. You don't need any of that.

sqlite-vec is a SQLite extension that adds vector similarity search. Your entire knowledge base lives in a single .sqlite file. Back it up by copying the file. Search it with SQL.

# Install sqlite-vec (Python)
pip install sqlite-vec

# In Python:
import sqlite3
import sqlite_vec

db = sqlite3.connect("memory.sqlite")
db.enable_load_extension(True)
sqlite_vec.load(db)

# Create a virtual table for vectors
db.execute("""
  CREATE VIRTUAL TABLE IF NOT EXISTS vec_chunks
  USING vec0(embedding float[768])
""")

# Store a chunk with its embedding
db.execute(
  "INSERT INTO vec_chunks(rowid, embedding) VALUES (?, ?)",
  [chunk_id, embedding_bytes]
)

# Search by similarity (cosine distance)
results = db.execute("""
  SELECT rowid, distance
  FROM vec_chunks
  WHERE embedding MATCH ?
  ORDER BY distance
  LIMIT 5
""", [query_embedding_bytes]).fetchall()

The Search Script

Tie them together with a search script the agent can call. Mine uses hybrid search — vector similarity for meaning plus BM25 keyword matching for precision. The agent calls it like any other command-line tool:

# Search everything
python3 memsearch "how does the BLE handshake work" --top 5

# Search only project docs
python3 memsearch "shopify webhook setup" --source filesystem --top 5

# Search only agent memory logs
python3 memsearch "what did we decide about pricing" --source memory --top 3

Results come back with file path, line numbers, relevance score, and a text snippet. The agent reads the relevant chunk, follows cross-references if needed, and stops. No bulk loading, no wasted context.

The indexer runs separately — a script that walks your knowledge base, chunks each file, embeds the chunks through Ollama, and stores them in SQLite. Run it on a schedule or trigger it on file changes. Mine indexes ~500 markdown files in under 30 seconds.

Memory Fragments

The DFCL article covers the concept. Here's the actual file structure:

memory/
├── MEMORY.md              ← Boot index (always loaded, <60 lines)
├── dev-standards.md       ← Coding conventions (loaded before writing code)
├── mobile-app.md          ← App context (loaded when working on the app)
├── roam.md                ← Hardware project (loaded for firmware/BLE work)
├── ml-pipeline.md         ← ML training (loaded for model work)
├── gotchas.md             ← Known pitfalls (loaded before risky changes)
├── skills.md              ← Integration configs (loaded for skill work)
├── business-ops.md        ← Products, pricing, e-commerce
├── people.md              ← User profiles, preferences
├── lessons.md             ← Hard-won engineering rules
└── ...                    ← More as needed

The Boot Index

MEMORY.md is the only file loaded every session. It's a table of contents — fragment name, when to load it, and the file path. The agent reads the index, sees what knowledge exists, and pulls fragments on demand as the conversation requires them.

# MEMORY.md — Boot Index

## Protocol
1. Read this index on session start
2. Load fragments on demand based on conversation context
3. Run memsearch for topics not covered by fragments
4. Update fragments when you learn something stable

## Fragment Index

| Fragment          | When to load                        |
|-------------------|-------------------------------------|
| dev-standards.md  | Before writing ANY code             |
| mobile-app.md     | Working on the app                  |
| roam.md           | BLE firmware, hardware debugging    |
| gotchas.md        | Before risky changes                |
| skills.md         | Integration work (Drive, Telegram)  |

Cross-References

Fragments link to each other with relative markdown links. When the agent loads one fragment and hits a reference to another, it can follow the link — or not, depending on whether the task needs it.

# roam.md — BLE Wrist Display

## Hardware
- MCU: Seeed XIAO nRF52840 (Prototype 2)
- Display: 1.47" rounded LCD (ST7789)
- See also: [roam-cad.md](roam-cad.md) for housing design
- See also: [gotchas.md](gotchas.md) before flashing firmware

## BLE Protocol
- Service UUID: 12345678-1234-...
- Message characteristic: writable, 20 byte MTU
- Chunking: messages >20 bytes split automatically

This creates a knowledge graph. The agent navigates it the way you navigate Wikipedia — start at one topic, follow links when the question demands it, stop when you have enough context.

Git As Infrastructure

Here's an underappreciated move: put your agent's memory in a git repo.

Not because it's a software project. It's not. But git gives you three things for free that are hard to build yourself:

Rollback. Agent corrupts a memory fragment? Git revert. You're back to the last good state in seconds. No backup scripts, no snapshots, no "are you sure?" — just revert the commit.

Remote backup. Push to GitHub and your agent's entire brain is backed up off-machine. Laptop dies? Clone and you're back. This also means you can run agents on different machines against the same memory.

History. Git log is a timeline of how your agent's knowledge evolved. When did it learn about the new API? When did the pricing change? Blame the file and find out. The commit history becomes a memory about memory.

# Initialize memory repo
cd ~/.agent/memory
git init
git add -A && git commit -m "initial memory state"

# Add remote backup
git remote add origin git@github.com:you/agent-memory.git
git push -u origin main

# Agent updates a fragment during a session:
#   1. Edit the file
#   2. git add memory/roam.md
#   3. git commit -m "update BLE protocol notes after P2 testing"
#   4. git push

# Something goes wrong? Roll back:
git log --oneline memory/roam.md
git revert abc1234

The repo doesn't need to be a coherent software project. It doesn't need CI, tests, or a build step. It's just files under version control. Git doesn't care what's in the files — it tracks changes. That's exactly what you want for agent memory.

Worktrees for Parallel Agents

If you run multiple agent sessions simultaneously (and you should — one researching while another codes), git worktrees prevent them from stepping on each other. Each agent gets its own branch and working directory from the same repo:

# Agent A works in the main checkout
# Agent B gets a worktree:
git worktree add .worktrees/agent-b -b agent-b-feature

# Agent B works in .worktrees/agent-b/
# No conflicts. Independent branches.
# Merge when both are done.

The Knowledge Base

Memory fragments are the agent's curated, distilled knowledge. But there's a larger body of raw information the agent needs access to — project specs, research notes, meeting notes, design decisions, build logs.

I keep all of that in an Obsidian vault — a folder of markdown files with wiki-links and YAML frontmatter. But the tool doesn't matter. Any folder of text files works. The point is: you have a single directory tree that contains everything the agent might need to know, and the semantic search layer indexes all of it.

~/Documents/Knowledge/            ← or wherever you keep notes
├── Projects/
│   ├── MyApp/
│   │   ├── Spec.md                ← indexed and searchable
│   │   ├── Known Issues.md
│   │   └── Architecture.md
│   ├── Hardware/
│   │   ├── BOM.md
│   │   └── Build Log.md
│   └── Research/
│       └── Market Analysis.md
├── People/
│   └── Contacts.md
└── Reference/
    ├── Pricing.md
    └── Vendor Notes.md

The indexer crawls this tree, chunks each file, and stores embeddings in the SQLite database. When the agent searches for "BLE handshake timing", it finds the relevant chunk in your hardware build log — even if the agent has never seen that file before. The knowledge base is the long-term memory. Fragments are the working notes. Semantic search connects them.

Skills — Pluggable Integrations

An agent that can only read and write files is useful. An agent that can also push to Google Docs, deploy a website, and message you on Telegram is a coworker.

Skills are self-contained integration packages. Each one is a folder with a manifest file and some scripts. The agent discovers them on startup, knows when to use them based on the manifest description, and calls them like tools.

Skill Structure

~/.agent/skills/
├── gdrive/
│   ├── SKILL.md              ← manifest (name, description, permissions)
│   ├── scripts/
│   │   ├── gdrive.py         ← list, upload, create-doc, read-doc, write-doc
│   │   └── gdrive_auth.py    ← OAuth token setup
│   └── secrets/
│       └── oauth-token.json  ← auto-refreshing credentials
│
├── telegram/
│   ├── SKILL.md
│   ├── scripts/
│   │   ├── telegram.py       ← send, poll, read messages
│   │   └── daemon.py         ← always-on listener (more on this below)
│   ├── config.json           ← bot token, contact IDs
│   └── state.json            ← polling offset
│
└── website/
    ├── SKILL.md
    └── (uses git push → Vercel auto-deploy)

The Manifest

Each skill's SKILL.md file tells the agent what the skill does, when to use it, and what tools it's allowed to invoke:

---
name: gdrive
description: >
  Read, write, list, and upload files to Google Drive and
  Google Docs. Use when the user mentions Drive, Docs,
  uploading documents, or shared folders.
user-invocable: true
allowed-tools: Bash(python3:*), Read, Write
argument-hint: "[action] [args...]"
---

## Commands
- list [folder-id]
- upload <file-path> [folder-id]
- create-doc <title> [folder-id]
- write-doc <doc-id> <markdown-content>
- read-doc <doc-id>

The agent reads this manifest at startup. When a user says "put that in Google Docs", the agent matches it to the skill description, invokes the script, and handles the integration. No hardcoding, no custom routing logic. Add a new skill by dropping a folder in the skills directory.

Skills are agent-agnostic. The scripts are just Python (or Node, or Bash). Any AI agent that can run shell commands can use them. The manifest is the contract — swap the agent, keep the skills.

Remote Access — Telegram

An agent stuck behind a terminal is an agent you can only use at your desk. That defeats half the purpose.

The harness includes a Telegram daemon — a Python script that long-polls the Telegram Bot API, routes incoming messages to persistent agent sessions, and sends responses back. You message your bot from your phone. The agent responds. Full tool access, full memory, no browser needed.

How It Works

Create a Telegram bot — Talk to @BotFather on Telegram, get a bot token. Free, takes 30 seconds.

Run the daemon — A Python script long-polls getUpdates on the bot token. When a message arrives, it routes by chat ID to a per-user agent session.

Persistent sessions — Each user gets a deterministic session ID (UUID5 from their name). The agent resumes the same session every time — full conversation history, memory intact.

Response routing — Agent output goes back through the bot API to the user's Telegram chat. Long responses get chunked at the 4096-character limit.

# daemon.py (simplified core loop)
while True:
    updates = poll_telegram(bot_token, offset, timeout=30)
    for update in updates:
        chat_id = update["message"]["chat"]["id"]
        text = update["message"]["text"]

        # Route to per-user agent session
        session_id = uuid5(NAMESPACE, contacts[chat_id])

        # Run the agent with full permissions
        response = run_agent(
            prompt=text,
            session_id=session_id,
            permissions="bypassPermissions"
        )

        # Send response back to Telegram
        send_telegram(bot_token, chat_id, response)
        offset = update["update_id"] + 1

Keep It Running

On macOS, register the daemon as a LaunchAgent so it starts on login and restarts on crash:

<!-- ~/Library/LaunchAgents/com.yourname.ai-daemon.plist -->
<plist version="1.0">
<dict>
  <key>Label</key>
  <string>com.yourname.ai-daemon</string>
  <key>ProgramArguments</key>
  <array>
    <string>/usr/bin/python3</string>
    <string>/path/to/daemon.py</string>
  </array>
  <key>KeepAlive</key>
  <true/>
  <key>RunAtLoad</key>
  <true/>
</dict>
</plist>

On Linux, use a systemd service. On Windows, a scheduled task or NSSM wrapper. The mechanism doesn't matter — just make sure the daemon survives reboots.

Rapid Prototyping

The harness isn't just for memory and messaging. It's a development environment. With the right frameworks in the agent's toolkit, you go from idea to deployed app in a single session.

Pick frameworks the agent already knows deeply and that minimize the gap between "code written" and "thing running."

Web apps

Next.js — Full-stack React, API routes, deploy to Vercel in one push

Mobile apps

Expo / React Native — Cross-platform, OTA updates, EAS builds from the command line

APIs

Express or Hono — Lightweight, deploy anywhere, agent writes these fluently

Static sites

Next.js (again) — SSG mode, free Vercel hosting, agent handles the whole flow

The key is: the agent should be able to scaffold, build, test, and deploy without you typing commands. If your framework choice requires manual steps, IDE plugins, or GUI configuration, it's a bad fit for agent-driven development. Everything must be scriptable.

With or Without a Gateway

Some setups route all agent interactions through a gateway service — a local server that manages sessions, handles tool calls, and proxies API requests. Tools like OpenClaw, Open Interpreter, and AutoGPT work this way.

You don't need one.

A gateway adds value when you're orchestrating multiple models or need a custom tool-calling protocol. But for a single agent with direct tool access, it's a middleman. The harness architecture works with any agent that can read files and run shell commands. That's it. The two minimum requirements.

Setup	When to Use
Direct agent + harness	Single agent, full tool access, simplest path. Agent reads memory files, runs scripts, commits to git directly.
Agent + Telegram daemon	Same as above but accessible remotely. Daemon is just a message router, not an orchestrator.
Gateway (OpenClaw, etc.)	Multi-model orchestration, custom tool protocols, team setups where multiple agents share state through a coordinator.

Start without the gateway. You can always add one later. The harness doesn't depend on it.

Putting It Together

Here's the full wiring diagram — every piece and how they connect:

┌─────────────────────────────────────────────────────────┐
│                     AI AGENT                            │
│  (Claude, GPT, local model — anything with tool access) │
├─────────────────────────────────────────────────────────┤
│                                                         │
│   Reads: MEMORY.md → loads fragments on demand          │
│   Calls: memsearch → semantic search over knowledge     │
│   Runs:  skills/* → Drive, Telegram, deploys            │
│   Uses:  git → commits memory changes, pushes backup    │
│                                                         │
└────────┬──────────────┬──────────────┬──────────────────┘
         │              │              │
    ┌────▼────┐   ┌─────▼─────┐  ┌────▼────┐
    │ Memory  │   │  Search   │  │ Skills  │
    │ (git)   │   │  Layer    │  │         │
    ├─────────┤   ├───────────┤  ├─────────┤
    │ DFCL    │   │ Ollama    │  │ gdrive  │
    │ frags   │   │ nomic-    │  │ tg bot  │
    │ .md     │   │ embed     │  │ website │
    │ files   │   │ sqlite-   │  │ deploy  │
    │         │   │ vec       │  │         │
    └────┬────┘   └─────┬─────┘  └────┬────┘
         │              │              │
         ▼              ▼              ▼
    GitHub        Knowledge       Google,
    (backup)      Base            Telegram,
                  (Obsidian/      Vercel
                   markdown)      (external)

Bootstrap Sequence

To set up the harness from scratch:

Install Ollama — Pull nomic-embed-text. Verify the API responds.

Set up the memory repo — Create MEMORY.md and your first fragments. Git init, add remote.

Build the search index — Write or adapt an indexer that chunks your knowledge base, embeds via Ollama, stores in sqlite-vec.

Create the memsearch script — A CLI tool the agent can call to search. Takes a query, returns ranked results.

Add skills as needed — Start with what you use most. Drive? Telegram? Slack? One folder per integration.

Write the onboarding doc — A single file that tells any new agent: here's where everything lives, here's how to search, here's what's available.

Point your agent at it — Configure the agent to read MEMORY.md on startup. That's the only hard requirement. Everything else, it discovers.

The Plug-and-Play Test

Here's how you know the harness is working: swap your agent.

Point a completely different AI — different model, different provider, different interface — at the same memory directory and tell it to read MEMORY.md. If it can find your projects, search your knowledge base, use your skills, and pick up where the last agent left off, the harness works.

That's the whole point. The AI is replaceable. The harness is the product. Your knowledge, your integrations, your workflows — they survive any model change, any provider switch, any pricing shakeup. You're not locked in. The agent is a tenant. The harness is the building.

This harness was built over three months while running a hardware business, training clients, and shipping software — all with AI agents as the primary workforce. It started as a text file and grew into infrastructure. The architecture described here is running in production today, handling everything from 3D print estimates to firmware debugging to this article you're reading now.