BEAVER: Enterprise benchmark for LLM Text-to-SQL from private data warehouses
·1 min read
·
0 reactions
·
0 comments
·
8 views
Original article
Github
Opening excerpt (first ~120 words) tap to expand
BEAVER is a large-scale enterprise text-to-SQL dataset containing 9128 queries spanning 812 tables across 19 diverse domains. Of these, 7978 queries are publicly released, while the remaining portion is held out as a private test set. Queries and databases were collected from private organizations. To facilitate fine-grained evaluation and analysis, we provide annotations for five subtasks: multi-table retrieval, join key detection, column mapping, domain knowledge extraction, and query decomposition three categories of queries: complex queries without domain knowledge, domain-specific queries with minimal complexity, and domain-specific complex queries
Excerpt limited to ~120 words for fair-use compliance. The full article is at Github.
Anonymous · no account needed