WeSearch

We Fixed Karpathy’s LLM Wiki - PENgram Is the Typed Knowledge Graph Pipeline Everyone Asked For

·6 min read · 0 reactions · 0 comments · 1 view
We Fixed Karpathy’s LLM Wiki - PENgram Is the Typed Knowledge Graph Pipeline Everyone Asked For

We recently published an article about the gaps in Karpathy's LLM Wiki pattern. The thesis was...

Original article
DEV Community
Read full at DEV Community →
Full article excerpt tap to expand

try { if(localStorage) { let currentUser = localStorage.getItem('current_user'); if (currentUser) { currentUser = JSON.parse(currentUser); if (currentUser.id === 3748893) { document.getElementById('article-show-container').classList.add('current-user-is-article-author'); } } } } catch (e) { console.error(e); } Penfield Posted on Apr 28 • Originally published at penfieldlabs.substack.com We Fixed Karpathy’s LLM Wiki - PENgram Is the Typed Knowledge Graph Pipeline Everyone Asked For #ai #aimemory #wiki #obsidian We recently published an article about the gaps in Karpathy's LLM Wiki pattern. The thesis was simple: wikilinks without relationship types are just lines on a graph. You can see that two notes connect but not how. Does one support the other? Contradict it? Supersede it? That semantic layer is what turns a pile of linked notes into a knowledge graph. The comments pushed us further. People running production knowledge graphs at scale said typed relationships like supersedes or contracts fundamentally changed how their agents reason. Others pointed out that structure alone doesn't solve things when connections explode in volume. You need something to actively maintain the graph over time. One thread debated whether the real problem is staleness, not structure, and whether auto-expiry or evaluation agents are the answer. The common thread: everyone wanted to move past the theory. Typed links are the first step. An Obsidian plugin is another step. But the hard part is taking raw content, extracting entities, classifying relationships, and producing a graph you can actually query. That's a pipeline. We didn't have one ready. Now we do. Graphify showed us the architecture Before we built anything, we studied what already existed. Graphify by Safi Shamsi is an open-source codebase-to-knowledge-graph tool that does many things well. Its three-pass architecture is clean and smart: Deterministic extraction - tree-sitter parses code into AST nodes and edges. No LLM needed, no hallucination risk, fast. Local processing - Whisper transcribes audio and video into text. Runs on your machine, no API calls. LLM semantic extraction - an LLM reads the text and identifies entities, concepts, and relationships. Local or remote, your choice. The SHA256 incremental caching is elegant, only reprocess the files that changed. The Leiden community detection finds clusters in the graph. The interactive HTML visualization lets you explore results in a browser. What Graphify doesn't do is type its relationships. An edge between two nodes is calls, imports, or semantically_similar_to. That works for code analysis. It doesn't work when you need to know that one research paper contradicts another, or that a concept supersedes an older one. That's where we decided to pick up the ball and run with it. PENgram: Parse, Extract, Normalize PENgram takes Graphify's architectural patterns and rebuilds the pipeline around a typed relationship vocabulary. The same 24 types we discussed in the previous article. The name is a nod to engrams, the memory traces that Wilder Penfield spent his career mapping in the human brain. PENgram builds memory traces from your data. What goes in PENgram accepts the messy reality of how knowledge actually lives: Input How it's processed Code (25 languages, 37 file extensions) Tree-sitter AST extraction — classes, functions, imports, call graphs Markdown, text, HTML Direct LLM extraction PDFs Text extraction via pypdf, then LLM EPUBs Text…

This excerpt is published under fair use for community discussion. Read the full article at DEV Community.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Email

Discussion

0 comments

More from DEV Community