Stop Using LLMs Like Giant Problem Solvers
The article discusses the challenges of using large language models (LLMs) for processing messy data, specifically in transforming compliance PDFs into structured JSON rules. The author shares insights on improving the process by simplifying the agent's tasks and handling data iteratively. Key lessons include preparing source data in advance and separating responsibilities between semantic understanding and mechanical processing.
- ▪The initial approach to using LLMs resulted in inaccuracies and broad rules due to the messy nature of the source data.
- ▪The author found that simplifying the agent's tasks and preparing data upfront significantly improved the output quality.
- ▪Processing documents iteratively allowed for easier inspection, retries, and auditing of the results.
Opening excerpt (first ~120 words) tap to expand
LLM Applications Stop Using LLMs Like Giant Problem Solvers How I turned 100 messy pdfs into structured insights by building a deterministic loop around agents Clara Chong May 26, 2026 6 min read Share Image by Wesley Tingey from Unsplash I recently worked on a feature where I had to transform 100 messy compliance pdfs into structured JSON rules. The brute force approach was obvious: give the agent the source text, explain the task, provide examples, and ask it to generate the rules. Since it was the lowest-hanging fruit, I tried it first. At a glance, the output looked fine. The output JSON was valid and matched what I expected. But as I was manually sampling the results to check for accuracy, the cracks appeared. Some rules were too broad, others were missed.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Towards Data Science.