Improving Local Techdocs for Your AI Coding Agent
The article discusses enhancing technical documentation for AI coding agents through a structured approach. It outlines a two-step classification process for pages, followed by embedding and building a knowledge graph. The goal is to filter out non-content pages and focus on useful information for AI applications.
- ▪The process begins with a rule-based classification to identify legal and navigation pages.
- ▪Pages that cannot be classified by rules are sent to a local LLM for further classification.
- ▪After classification, pages are embedded using a local sentence transformer model to facilitate faster processing.
Opening excerpt (first ~120 words) tap to expand
This is the third post in the series about making technical documentation available for use in your AI agent or knowledge base, based on our work on Morsel, a knowledge base that improves itself using AI agents. In the first post I described how we crawl documentation sites, clean the page content, and generate descriptions for images. In the second post I shared practical gotchas we ran into when crawling complete techdocs. Here I want to describe what we do afterwards to structure the crawled documentation further and make it available in a more useful form - for example, for your local AI coding agent. At a high level, we classify pages, embed them with a local model, and then build a knowledge graph that combines explicit hyperlinks with semantic similarity edges.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Philip Heltweg.