NARE: An LLM agent that amortizes reasoning into memory and executable rules
Contribute to starface77/Neuro-Adaptive-Reasoning-Engine development by creating an account on GitHub.
Full article excerpt tap to expand
NARE — Non-parametric Amortized Reasoning Evolution A Skill-Based Cognitive Architecture for Deterministic Routing of Logic Tasksvia Semantic Compression and Executable Reflexes Quickstart • Architecture • Features • Benchmarks • Русский • Citation Overview NARE is an experimental hierarchical-cache architecture for LLM reasoning. It pairs an LLM (default: Gemma-3-27B via Google Generative AI) with an episodic memory and a registry of executable skills compiled from past reasoning trajectories. A 4-way router dispatches each query to the cheapest viable layer: an exact cache, a sandboxed Python skill, a delta-reasoning step over a similar past episode, or a full Tree-of-Thoughts pass. What this is. A research/engineering prototype, not a benchmarked system. The interesting parts are: (1) AST-validated execution of LLM-generated skills, (2) a sleep/REM consolidation loop that compiles repeated patterns into Python, (3) decomposed (trigger / execute / stress) confidence scoring, and (4) a maturity / shadow-check lifecycle for promoted skills. What this is not. This repository does not include results on standard reasoning benchmarks (HumanEval+, MATH, GSM8K, BIG-Bench Hard, AlfWorld, WebArena). The Free-Energy / active-inference / Bayesian-model-reduction / topological framings in earlier drafts are conceptual inspirations, not formal claims about what the code computes — see LIMITATIONS.md. 🏗 Architecture ┌─────────────────────────────┐ │ Input Query │ └──────────┬──────────────────┘ │ ┌──────────▼──────────────────┐ │ Semantic Embedding │ │ (gemini-embedding-001) │ └──────────┬──────────────────┘ │ ┌────────────────────▼────────────────────────┐ │ 4-WAY DYNAMIC ROUTER │ │ │ │ ┌──────────┐ Exact ┌──────────────┐ │ │ │ Layer 0 ├──match───►│ FAST CACHE │ │ │ └────┬─────┘ │ (0 tokens) │ │ │ │ no match └──────────────┘ │ │ ┌────▼─────┐ trigger ┌──────────────┐ │ │ │ Layer 1 ├──hit────►│ REFLEX │ │ │ │ │ │ (0 tokens, │ │ │ └────┬─────┘ │ O(1) exec) │ │ │ │ no trigger └──────────────┘ │ │ ┌────▼─────┐ sim>τ ┌──────────────┐ │ │ │ Layer 2 ├──────────►│ HYBRID │ │ │ │ │ │ (δ-reasoning)│ │ │ └────┬─────┘ └──────────────┘ │ │ │ sim<τ │ │ ┌────▼─────┐ ┌──────────────┐ │ │ │ Layer 3 ├──────────►│ SLOW │ │ │ │ │ │(Tree-of- │ │ │ └──────────┘ │ Thoughts) │ │ │ └──────────────┘ │ └─────────────────────────────────────────────┘ │ ┌────────────────────▼────────────────────────┐ │ MEMORY SYSTEM │ │ │ │ ┌─────────┐ ┌──────────┐ ┌─────────────┐ │ │ │Episodic │ │ Semantic │ │ Factual │ │ │ │ (FAISS) │ │ (Skills) │ │ (RAG) │ │ │ └─────────┘ └──────────┘ └─────────────┘ │ │ ┌─────────┐ ┌──────────┐ ┌─────────────┐ │ │ │ Graph │ │ RL │ │ Neural │ │ │ │ Memory │ │Retriever │ │(Titans/MIRAS)│ │ │ └─────────┘ └──────────┘ └─────────────┘ │ └─────────────────────────────────────────────┘ │ ┌────────────────────▼────────────────────────┐ │ SLEEP CONSOLIDATION │ │ │ │ ┌─────────────┐ ┌──────────────────┐ │ │ │ NREM │ │ REM │ │ │ │ (cluster + │─────►│ (stress-test + │ │ │ │ compile) │ │ repair skills) │ │ │ └─────────────┘ └──────────────────┘ │ │ │ │ ┌─────────────────────────────────────────┐ │ │ │ Meta-Abduction (cross-domain transfer) │ │ │ └─────────────────────────────────────────┘ │ └─────────────────────────────────────────────┘ Cognitive Workflow Novel Problem Recurring Problem Mature Skill │ │ │ SLOW Path FAST Cache REFLEX Path (60+ sec) (~0.01 sec) (~0.001 sec) │ │ │ Tree-of-Thoughts Exact Match Python exec() + HybridCritic Retrieval Zero API cost │ │ │…
This excerpt is published under fair use for community discussion. Read the full article at GitHub.