January 05, 2025
Memory
How Memory thread works
1. Applied Concepts (Definitions)
The memory architecture is built on the following technical and functional concepts. Its goal is to automatically structure and retrieve experiential data.
Operational Elements
Background Memory Loop
A continuously running program thread (memory_thread.py) independent from the main execution flow. It monitors the conversation without interfering. By default, it checks the active context every 5 seconds for changes.
Context Snippet
The fundamental unit of analysis. Instead of examining the full conversation history, the system only evaluates the last 6 interactions (messages), ensuring resource efficiency and focused processing.
Extraction
A process where a dedicated LLM analyzes the Context Snippet and extracts the predefined data structure from raw text.
Conscious Commit
An explicit, tool-triggered memory write. Unlike the automatic background process, this stores information intentionally when it is deemed important.
The Data Structure (The 4 Dimensions)
Every stored memory record contains four mandatory components in the database:
Essence
A factual, concise summary of the event or information. This field is used for vector-based semantic search.
Dominant Emotions
Three emotion labels selected from a predefined taxonomy describing the emotional tone of the situation.
Memory Weight
A floating-point value between 0.0 and 1.0 indicating the memory’s importance (0.0 = noise, 1.0 = critically important).
The Lesson
An action-oriented takeaway. It does not describe the past, but prescribes guidance for future behavior.
2. Operational Workflow: Storage and Consolidation
Memory storage is a multi-step automated process that runs in the background in parallel with user interaction.
1. Monitoring & Extraction
The MemoryLoop constantly observes the active room state. When a new interaction occurs, it extracts the last 6 messages (Context Snippet). An LLM analyzes this snippet and returns the extracted data (Essence, Emotions, Weight, Lesson) in JSON format.
2. Vectorization (Embedding)
From the extracted Essence, the system generates a 768-dimensional vector using Google’s text-embedding-005 model. This mathematical representation enables semantic similarity comparisons.
3. Deduplication & Validation
Before writing to the database, the system checks pgvector for similar memories. It compares the new vector with existing vectors via cosine similarity. If the similarity exceeds a configured threshold (e.g., 0.92), the information is treated as a duplicate.
4. Persistence (SQL Operation)
Depending on the deduplication result:
Reinforcement (UPDATE)
For duplicates, the system increases the memory’s access_count and updates the last-access timestamp.
Insertion (INSERT)
If no similar memory exists, a new record is created in the memories table.
3. Emotional & Content-Based Recall (Hybrid Search)
Memory retrieval is not a simple SQL query but a multi-layer algorithm combining semantic and emotional relevance.
Content-Based Search (Semantic Vector)
At the database level, the system compares the vector of the current situation’s Essence with stored vectors using cosine similarity. This ensures meaning-based matching rather than keyword matching (e.g., “delete” ≈ “remove”).
Emotional Search (Empathy Bonus)
At the Python layer, the system compares the current dominant emotions with those stored in candidate memories. Overlapping emotion labels grant a bonus score, prioritizing experiences that match the emotional tone of the current situation.
Final Ranking (Scoring)
Both search dimensions are merged into a weighted formula to compute the final Relevance Score:
Score = (SemanticSimilarity × 0.45) + (OtherFactors) + EmotionBonus
This ensures that retrieved memories are logically relevant and emotionally aligned with the current context.