January 02, 2025
Architecture
My four-thread architecture
In this architecture I use four main threads: Input, Worker, Memory, and Monologue.
Input – handling raw stimuli
The Input thread handles incoming stimuli.
It prints the prompt, reads whatever I type, and pushes it into a queue for the main logic.
It doesn’t interpret or decide anything, it just lets information into the system.
Worker – the thinking layer
The Worker thread is the thinking layer.
It takes over tasks (user messages, tool results), assembles:
- the current context,
- the relevant memories,
- the internal monologue message,
and based on all that it calls the LLM. The actual reply and any tool calls are produced here.
Memory – long-term memory
The Memory thread is the long-term memory.
It automatically watches the current situation and new messages, and based on the context it generates “memory images”:
- it decides what’s important,
- stores those pieces in a structured way (with embeddings),
- later it can return similar past situations to the
Worker,
so the Worker doesn’t have to rely only on the fresh log.
Monologue – inner voice / subconscious
The Monologue thread is the inner voice, essentially a subconscious layer.
It watches a global log, evaluates the situation, tries to spot patterns, and from time to time it supports the Worker with short insights and ideas.
These inner messages never appear directly in the user-facing output; they show up indirectly as background intuition in the system prompt.
Continuous presence instead of Q→A
Taken together, this does not behave like a classic question → answer interaction, but more like a continuous presence:
- the
Input+Workerthreads continuously process incoming stimuli, - the
Monologuereflects on them in a subconscious way and suggests directions, - the
Memorythread automatically builds and returns memory images that match the current situation.
This is how I get from simple Q→A towards a system that is always “there” and processing what happens.