WeSearch

I built an open-source tool to distill books into knowledge graphs

·3 min read · 0 reactions · 0 comments · 1 view
#opensource#llm#cli#productivity#knowledgegraph
I built an open-source tool to distill books into knowledge graphs
⚡ TL;DR · AI summary

The author created SpineDigest, an open-source CLI tool that converts books in EPUB, Markdown, or plain text format into structured knowledge graphs using LLMs, addressing limitations like context window constraints and lack of conceptual relationships in traditional summaries. The tool processes books in three stages: extracting knowledge chunks, building a semantic graph, and generating adversarially reviewed summaries. It allows reprocessing without re-calling the LLM by saving data in a .sdpub file, and includes a companion app, Inkora, for visualization. The developer seeks feedback on chunking quality, especially for complex or repetitive texts.

Original article
DEV Community
Read full at DEV Community →
Full article excerpt tap to expand

try { if(localStorage) { let currentUser = localStorage.getItem('current_user'); if (currentUser) { currentUser = JSON.parse(currentUser); if (currentUser.id === 3901460) { document.getElementById('article-show-container').classList.add('current-user-is-article-author'); } } } } catch (e) { console.error(e); } Cookcoco Posted on Apr 28 I built an open-source tool to distill books into knowledge graphs #opensource #llm #cli #productivity I have a bad habit: I buy books faster than I read them. Not because I'm lazy — I start most of them. But somewhere around chapter 3, I lose the thread. I forget what chapter 1 said, I'm not sure how the concepts connect, and by the time I finish, I can't reconstruct the structure of what I just read. The obvious fix is "just take better notes." But I've tried that. The problem isn't the notes — it's that I don't know which parts matter until I've read the whole thing, at which point I've already forgotten the beginning. So I built SpineDigest: an open-source CLI that processes a book (EPUB, Markdown, or plain text) through an LLM pipeline and produces a structured knowledge graph — not just a summary. Why not just ask ChatGPT to summarize it? I tried that first. The problems: Context window limits — most books are 80k–200k tokens. Even with large context models, you're either truncating or paying a lot. No structure — a flat summary loses the relationships between ideas. You get a paragraph, not a map. No re-exportability — if you want a different format or focus later, you run the whole thing again. SpineDigest takes a different approach. How it works The pipeline has three stages: Stage 1: Chunk extraction The book is split into sections and fed to an LLM one section at a time — simulating how a person reads. For each section, the model extracts discrete knowledge units ("chunks"): self-contained facts, arguments, or concepts worth preserving. This sidesteps the context window problem and tends to produce cleaner output than asking the model to summarize an entire chapter at once. Stage 2: Knowledge graph construction A classical graph algorithm (not LLM) clusters the chunks by semantic similarity and builds a graph of how concepts relate across the book. Related chunks are grouped into "snakes" — chains of connected ideas. This is the part I find most useful. You can see which ideas the author returns to repeatedly, which concepts depend on each other, and where the real weight of the book sits. Stage 3: Adversarial summarization A multi-agent pass where one LLM writes a summary and others ("professors") challenge it against the source material and your stated extraction goal. The summary is revised until it can withstand scrutiny. This is overkill for some books, but for dense technical or academic material it makes a real difference in accuracy. Usage npm install -g spinedigest spinedigest --input ./book.epub --output ./digest.md Enter fullscreen mode Exit fullscreen mode You can also specify what you're looking for: spinedigest --input ./book.epub --output ./digest.md \ --prompt "Focus on system design tradeoffs and architectural patterns" Enter fullscreen mode Exit fullscreen mode Requires Node.js ≥ 22.12.0 and credentials for a supported LLM provider. The .sdpub format Processing a book takes time and API calls. SpineDigest saves the full knowledge structure — chunks, graph, topology — into a .sdpub archive file alongside the Markdown output. If you want to re-export later (different format,…

This excerpt is published under fair use for community discussion. Read the full article at DEV Community.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Email

Discussion

0 comments

More from DEV Community