A file-level tree that lets an LLM reason over a document corpus
PageIndex has introduced a new File System designed for massive-scale document search, allowing reasoning over millions of documents. This system aims to improve upon traditional vector-based retrieval methods that often struggle with context and relevance. The PageIndex File System is now available for enterprise users, with a cloud edition expected soon.
- ▪PageIndex has crossed 26k GitHub stars and serves over 23k cloud users in production.
- ▪The new PageIndex File System allows a single index to reason over millions of documents without the limitations of traditional vector-based methods.
- ▪Classic vector-based retrieval often fails due to limited representation power and the inability to maintain context.
Opening excerpt (first ~120 words) tap to expand
PageIndex File System: Massive-Scale Document SearchPublished onMay 3, 2026PageIndex Team Contact us PageIndex now scales to millions of documents Available today for enterprise. Cloud rollout coming soon. (Get early access) We started PageIndex with one belief: retrieval over long documents should look more like human reading than like semantic similarity search. Since launch, the open-source PageIndex, one of the fastest-growing AI-infra repos on GitHub, has crossed 26k GitHub stars in a few months, hit #1 on GitHub Trending, been selected for the GitHub Secure Open Source Fund, and now serves 23k+ cloud users in production.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at PageIndex.