WeSearch

Sage-Wiki: An LLM-compiled personal knowledge base

·15 min read · 0 reactions · 0 comments · 3 views
Sage-Wiki: An LLM-compiled personal knowledge base

An LLM-compiled personal knowledge base. Drop in your papers, articles, and notes. sage-wiki compiles them into a structured, interlinked wiki — with concepts extracted, cross-references discovere...

Original article
GitHub
Read full at GitHub →
Full article excerpt tap to expand

English | 中文 sage-wiki An implementation of Andrej Karpathy's idea for an LLM-compiled personal knowledge base. Developed using Sage Framework. Some lessons learned after building sage-wiki here. Drop in your papers, articles, and notes. sage-wiki compiles them into a structured, interlinked wiki — with concepts extracted, cross-references discovered, and everything searchable. Your sources in, a wiki out. Add documents to a folder. The LLM reads, summarizes, extracts concepts, and writes interconnected articles. Scales to 100K+ documents. Tiered compilation indexes everything fast, compiles only what matters. A 100K vault is searchable in hours, not months. Compounding knowledge. Every new source enriches existing articles. The wiki gets smarter as it grows. Works with your tools. Opens natively in Obsidian. Connects to any LLM agent via MCP. Runs as a single binary — nothing to install beyond the API key. Ask your wiki questions. Enhanced search with chunk-level indexing, LLM query expansion, and re-ranking. Ask natural language questions and get cited answers. Compile on demand. Agents can trigger compilation for specific topics via MCP. Search results signal when uncompiled sources are available. sage-wiki.mp4 Dots on the outer boundary represent summaries of all documents in the knowledge base, while dots in the inner circle represent concepts extracted from the knowledge base, with links showing how those concepts connect to one another. Install # CLI only (no web UI) go install github.com/xoai/sage-wiki/cmd/sage-wiki@latest # With web UI (requires Node.js for building frontend assets) git clone https://github.com/xoai/sage-wiki.git && cd sage-wiki cd web && npm install && npm run build && cd .. go build -tags webui -o sage-wiki ./cmd/sage-wiki/ Supported Source Formats Format Extensions What gets extracted Markdown .md Body text with frontmatter parsed separately PDF .pdf Full text via pure-Go extraction Word .docx Document text from XML Excel .xlsx Cell values and sheet data PowerPoint .pptx Slide text content CSV .csv Headers + rows (up to 1000 rows) EPUB .epub Chapter text from XHTML Email .eml Headers (from/to/subject/date) + body Plain text .txt, .log Raw content Transcripts .vtt, .srt Raw content Images .png, .jpg, .gif, .webp, .svg Description via vision LLM (caption, content, visible text) Code .go, .py, .js, .ts, .rs, etc. Source code Just drop files into your source folder — sage-wiki detects the format automatically. Images require a vision-capable LLM (Gemini, Claude, GPT-4o). Quickstart Greenfield (new project) mkdir my-wiki && cd my-wiki sage-wiki init # Add sources to raw/ cp ~/papers/*.pdf raw/papers/ cp ~/articles/*.md raw/articles/ # Edit config.yaml to add api key, and pick LLMs # First Compile sage-wiki compile # Search sage-wiki search "attention mechanism" # Ask questions sage-wiki query "How does flash attention optimize memory?" # Interactive terminal dashboard sage-wiki tui # Browse in the browser (requires -tags webui build) sage-wiki serve --ui # Watch folder sage-wiki compile --watch Vault Overlay (existing Obsidian vault) cd ~/Documents/MyVault sage-wiki init --vault # Edit config.yaml to set source/ignore folders, add api key, pick LLMs # First Compile sage-wiki compile # Watch the vault sage-wiki compile --watch Docker # Pull from GitHub Container Registry docker pull ghcr.io/xoai/sage-wiki:latest # Or from Docker Hub docker pull xoai/sage-wiki:latest # Run with your wiki directory mounted docker…

This excerpt is published under fair use for community discussion. Read the full article at GitHub.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Email

Discussion

0 comments

More from GitHub