OpenClaw Architecture Deep Dive

Alexander Liu · Feb 18, 2026

This article was written based on reading the OpenClaw source code and the Lex Fridman × Peter Steinberger podcast.

In VISION.md, Peter explicitly defined what he sees OpenClaw as:

"OpenClaw is primarily an orchestration system: prompts, tools, protocols, and integrations. TypeScript was chosen to keep OpenClaw hackable by default."

The Journey of a Message

A message first passes through the Telegram channel adapter — OpenClaw's extensions/ directory contains about 20 such adapters, covering WhatsApp, Telegram, Discord, Feishu, Zalo, Nostr, and even IRC. The adapter normalizes the platform-specific message format into a unified MsgContext, then hands it to the core system.

Next comes a round of preprocessing: image understanding, link prefetching, slash command parsing, and session state loading. The message then enters the lane queue for serial execution. When its turn comes, the system dynamically assembles the system prompt, calls the LLM API, runs the agentic tool loop, and finally routes the reply back along the same path it came from.

20 Channels, 1 Convergence Point (High Extensibility)

Regardless of whether a message comes from WhatsApp, Telegram, Slack, or WebChat, they all ultimately call the same function: dispatchInboundMessage(). The adapter normalizes platform-specific message formats into a unified MsgContext — the core logic never needs to know what a WhatsApp webhook payload looks like or how to parse a Telegram update object. Adding a new channel (say LINE or Twitch) only requires writing an adapter plugin, without touching a single line of core code.

However, what gets "flattened" is only the format layer. Capability differences between channels are injected into the system prompt downstream, letting the agent know the behavioral boundaries of the current channel.

The Triple Identity of a Message

The same user message is split into three versions within the system:

const ctx: MsgContext = {
  Body: parsedMessage,          // Raw original for UI display
  BodyForAgent: stampedMessage,  // Timestamp-injected version for the LLM
  BodyForCommands: commandBody,  // For the command parser, may include /think prefix
};

"Only BodyForAgent gets the timestamp — Body stays raw for UI display."

This detail is small, but it reveals an important point: when your agent needs to serve three consumers simultaneously — UI display, LLM reasoning, and command parsing — "one message" isn't enough. Most agent frameworks don't have this design, because they only have one consumer (the LLM).

Dual-Layer Lane Queue

Every LLM call is wrapped in two layers of serial queues:

return enqueueSession(() =>        // session lane: serial within the same session
  enqueueGlobal(async () => {      // global lane: concurrency control within lane type
    // ... LLM API call
  })
);

The session lane ensures a user's messages are never processed in parallel — imagine sending three messages in quick succession: the second must wait for the first's agent turn to complete, because they share the same conversation history. This layer is per-session: different users' sessions each have their own queue, never blocking each other.

The outer "global lane" is not a single global bottleneck. The system defines four independent lanes: Main, Cron, Subagent, and Nested, each with its own concurrency counter and queue. User messages go through the Main lane, scheduled tasks through Cron, sub-agents through Subagent — they run fully in parallel, never competing with each other. Concurrency control only happens within the same lane: for example, the Main lane defaults to maxConcurrent = 1, meaning only one LLM call executes at any given moment across all user sessions.

The session lane isolates message ordering within a single user; the lane-type queue controls total concurrency for similar tasks. The combined effect: messages queue first by session, then by task type.

Core Modules

1. Channel Adapters: 20 Entry Points, One Exit

The extensions/ directory contains about 20 channel adapters — WhatsApp, Telegram, Discord, Slack, Feishu, Zalo, Nostr, IRC, and even cron scheduled tasks and email. Each adapter does one thing: convert platform-specific message formats into the unified MsgContext struct.

This means the core system is completely agnostic to where a message originates. A WhatsApp group chat and a cron-triggered email sorting task follow the same code path. The adapter also handles return routing — replies always travel back along the channel they came from.

The product of normalization is MsgContext — a unified struct containing over 70 fields. These fields cover message content, sender identity chain, media attachments, session tracking, authorization state, and more.

Channels also handle differences in @mention syntax across platforms (Slack uses <@U123>, Discord uses <@!123>), as well as thread reply mode differences. If these "small" differences leaked into the core, they would pollute the entire codebase.

2. Preprocessing: Before the LLM Sees the Message

After entering the core, messages go through a round of preprocessing before being sent to the LLM, split into four steps:

Media Understanding (applyMediaUnderstanding): Attachments like images, audio, and video are processed upfront. For example, audio is transcribed to text, images generate descriptions. This way the LLM receives structured text, not raw binary.

Link Prefetching (applyLinkUnderstanding): If a message contains URLs, the system fetches page content in advance and injects summaries into the context. The LLM doesn't need to fetch anything itself.

Directive Parsing (resolveReplyDirectives): Slash commands (like /reset, /model) are identified and executed at this step, never reaching the LLM. The command system is larger than you'd expect — the system has over a hundred built-in CLI commands.

Session Initialization (initSessionState): Loads conversation history, checks whether this is a new session, and handles identity resolution in group mode.

As messages enter the next stage, each Session has its own independent Follow-up Queue. When a user sends multiple messages in rapid succession, this queue determines how to handle the backlog — queue them sequentially (followup), merge-and-steer (steer), or collect-and-summarize (collect).

3. Lane Queue: Trading Serialization for Correctness

The preprocessed FinalizedMsgContext enters the lane queue (command-queue.ts) — the heart of the entire system. Each lane defaults to maxConcurrent = 1, meaning strict serialization. The system defines four lanes:

Main: User messages. Send three messages on WhatsApp, and they queue up one by one.
Cron: Scheduled tasks. Your email sorting cron job runs here, never blocking user messages.
Subagent: Sub-agent execution runs on an independent lane in parallel.
Nested: Nested calls.

This is the Factorio thinking Peter talks about — each layer is an independent optimization stage. Serialization by default guarantees correctness; concurrency is only unlocked where it's confirmed safe.

The takeaway: concurrency is not something to pursue by default. In an agent system, two messages simultaneously modifying the same file is a disaster. Serialization is the safest default; concurrency is a conscious opt-in.

4. Dynamic System Prompt: The Agent's Runtime "Self-Awareness"

When a message's turn comes for execution, the system dynamically assembles the system prompt. Not a hardcoded text, but generated in real-time based on the current environment.

Runtime info: Agent ID, hostname, OS, CPU architecture, Node version, current model, shell type, originating message channel. The agent literally "knows who it is and where it's running."

Tool inventory: A complete list and functional summary of currently available tools. Different agents and channels may have different tool sets.

SOUL.md: If this file exists, an instruction is injected — "embody its persona and tone." This is the technical implementation of what Peter calls "giving the agent a personality."

Skills list: Descriptions of currently installed skills, telling the agent which CLI tools it can use (e.g., himalaya for email, ordercli for food ordering).

Memory recall directive: Tells the agent to search memory files with memory_search before answering questions about historical information.

Safety section: Explicitly prohibits the agent from self-preservation, replication, and permission escalation. Also includes SILENT_REPLY_TOKEN — what Peter called "I gave him an option to shut up."

The system prompt distinguishes three modes: full (main agent, complete injection), minimal (sub-agent, only tools and workspace info), and none (only basic identity line). This directly reflects the multi-agent design principle — sub-agents don't need to and shouldn't know the "global" context.

SILENT_REPLY_TOKEN has a concrete engineering implementation: during reply normalization, if the silent token is detected and there are no media attachments, the reply is marked with skip reason "silent" and discarded. The agent can proactively choose not to bother the user — this isn't simple empty reply filtering, but an intentionally designed silence mechanism.

The execution safety parameters form a three-dimensional configuration space: execution environment (sandbox/gateway/node) × security level (deny/allowlist/full) × approval policy (off/on-miss/always) = 27 combinations. These parameters are persisted in the session, meaning different conversations can have different security boundaries.

5. Agentic Loop: tool_use → execute → loop

After the system prompt is assembled, the message along with conversation history is sent to the LLM API. The LLM's response may contain tool_use calls — for example, calling exec to run shell commands, calling himalaya to read email, or calling memory_search to query memory.

After each tool_use returns a result, the result is appended to conversation history and sent to the LLM again. This loop repeats until the LLM returns a plain text reply (no tool_use), indicating task completion.

This is the core of the agentic loop — the LLM doesn't answer questions in one shot, but completes tasks through multiple rounds of tool calls.

Compaction mechanism (compaction.ts) handles context window management. When token usage approaches the limit (e.g., 176K used out of a 200K window), the system inserts a special agentic turn — giving the Agent a prompt saying "you're about to be compacted, save important information to disk now." The Agent can write key decisions, TODOs, and context to memory files.

There's a subtle issue — if compaction removes a tool_use message but its corresponding tool_result remains, you get orphaned tool_result messages. The system runs a dedicated repair pass (repairToolUseResultPairing, Anthropic/Google only) to clean up orphans, ensuring structural integrity of the message history.

Default flow: token usage approaches limit → memory flush round → continue running → hit context overflow → auto-compact.

6. Multi-Agent: The Subagent System

OpenClaw's multi-agent capability isn't an abstract concept — it's a concrete engineering implementation. The main agent can create sub-agents via the sessions_spawn tool, each with its own independent session, conversation history, and tool set.

Key constraints:

Maximum depth: The system hardcodes a single level of recursion, preventing agents from infinitely creating sub-agents.
Permission isolation: Sub-agents can only use agent IDs specified in the allowAgents whitelist.
Independent Lane: Sub-agent execution runs on the Subagent lane, never blocking the main agent from processing user messages.

The main agent can view, steer, or terminate sub-agents via subagent tools. sessions_send enables cross-session communication.

7. Memory: Markdown + Vector Search

The memory system implements persistent memory with a two-layer structure:

File layer: MEMORY.md and memory/*.md are plain-text Markdown files stored in the agent's workspace. The agent can directly read and write these files.

Vector layer: The embedding manager creates vectorized indexes of memory files, supporting semantic search. The memory_search tool returns the most relevant snippets, and the memory_get tool pulls specific lines.

Indexing process:

Discover memory files — MEMORY.md and memory/*.md under the workspace
Watch for file changes with chokidar, debounce 1.5 seconds after detecting changes
Split markdown into 400-token chunks with 80-token overlap between adjacent chunks, preserving original line numbers
Generate embeddings for each chunk. Supports four providers: OpenAI (text-embedding-3-small), Voyage AI, Google Gemini, or local models (node-llama-cpp running GGUF models)
Write embedding vectors to three SQLite tables: chunks (raw text + metadata), chunks_vec (vectors for sqlite-vec), chunks_fts (full-text index for FTS5)

Search process:

Agent calls memory_search with query text
The query text also gets an embedding generated
Two parallel search paths: vector search (sqlite-vec cosine distance) + keyword search (FTS5 BM25 ranking)
Hybrid merge: weighted scoring — vector weight 0.7, keyword weight 0.3
Filter: minimum score threshold 0.35, snippet truncation at 700 characters, with citations
Return to Agent, inject into context

When memory is used: Only when the Agent actively calls memory_search. The system prompt contains a hardcoded instruction telling the Agent: "before answering questions about historical information, search with memory_search first." But this is prompt-level "enforcement," not code-level automatic injection.

In other words, if the Agent judges that the current question doesn't require checking memory, it can skip calling memory_search, and memory content won't enter the context. Memory is not background knowledge injected with every request — it's an on-demand retrieval tool, and the Agent decides when to use it.