30 results for "llms"
LLMs Corrupt Your Documents When You Delegate
Large Language Models (LLMs) are poised to disrupt knowledge work, with the emergence of delegated work as a new interaction paradigm (e.g., vibe coding). Delegation requires trust - the expectation t…
Yann LeCun: LLMs Are Nearing the End, but Better AI Is Coming (2025)
Yann LeCun, Chief AI Scientist at Meta, believes LLMs are doomed due to their inability to represent the high-dimensional spaces that characterize our world…
Context-Aware Hospitalization Forecasting Evaluations for Decision Support using LLMs
Medical and public health experts must make real-time resource decisions, such as expanding hospital bed capacity, based on projected hospitalization trends during large-scale healthcare disruptions (…
Running Local LLMs Offline on a Ten-Hour Flight
I flew from London to Google Cloud Next 2026 in Las Vegas. Ten hours with no in-flight wifi. I used the time to test how far a modern MacBook can carry engineering work on local LLMs alone. Setup A we…
What would be the best OS to run LLMs?
Hi there, I've ordered a mini PC with 128GB of RAM and the AMD AI Max 395. I intend to use it with Proxmox (like my actual machine), where I run Windows for some gaming and macOS for my music library …
LLMs Can't Generate Influence
Are people using AI/LLMs in Defense or Secure Environments?
Show HN: Waiting for LLMs Suck – Give your user a game
Give your user a game while they wait for the LLM to return a result.…
GSAR: Typed Grounding for Hallucination Detection and Recovery in Multi-Agent LLMs
Autonomous multi-agent LLM systems are increasingly deployed to investigate operational incidents and produce structured diagnostic reports. Their trustworthiness hinges on whether each claim is groun…
Google DeepMind Paper Argues LLMs Will Never Be Conscious | Philosophers said the paper’s argument is sound, but that “all these arguments have been presented years and years ago.”
Show HN: Ragnerock, an AI data analysis tool
Hi HN, I’m Matt Mahowald, and together with my cofounder John, we’re launching the public beta of Ragnerock today. As a data scientist, you spend the majority of your time wrangling data. Even though …
Agents Are Microservices with a Brain
We solved this in 2010. It was called microservices. Now we're making the same mistakes with LLMs.…
Does Point Cloud Boost Spatial Reasoning of Large Language Models?
3D Large Language Models (LLMs) leveraging spatial information in point clouds for 3D spatial reasoning attract great attention. Despite some promising results, the role of point clouds in 3D spatial …
AI Can Find the Code. It Didn't Know How the System Worked
21 bug fixes, two models, same failures. Better LLMs marginally improve things, but still failed on system boundaries and integration.…
Help with historical documents transcriptions
Hey there! I’m currently trying to transcribe some historical data from the NYSE. Specifically, the stock prices and (weekly) volume of set stocks. At the moment, I have tried manually transcribing th…
AI prefers resumes written by itself: Self-preferencing in Algorithmic Hiring
As artificial intelligence (AI) tools become widely adopted, large language models (LLMs) are increasingly involved on both sides of decision-making processes, ranging from hiring to content moderatio…
Claude-Powered Agent Apparently Deletes Company Database, Debases Itself Further in Confession
AI agents are powered by the same obsequious LLMs as consumer chatbots.…
Mitigating Belief Inertia via Active Intervention in Embodied Agents
Recent advancements in large language models (LLMs) have enabled agents to tackle complex embodied tasks through environmental interaction. However, these agents still make suboptimal decisions and pe…
FormalScience: Scalable Human-in-the-Loop Autoformalisation of Science with Agentic Code Generation in Lean
Formalising informal mathematical reasoning into formally verifiable code is a significant challenge for large language models. In scientific fields such as physics, domain-specific machinery (\textit…
A Systematic Approach for Large Language Models Debugging
Large language models (LLMs) have become central to modern AI workflows, powering applications from open-ended text generation to complex agent-based reasoning. However, debugging these models remains…
Don't Make the LLM Read the Graph: Make the Graph Think
We investigate whether explicit belief graphs improve LLM performance in cooperative multi-agent reasoning. Through 3,000+ controlled trials across four LLM families in the cooperative card game Hanab…
Analytica: Soft Propositional Reasoning for Robust and Scalable LLM-Driven Analysis
Large language model (LLM) agents are increasingly tasked with complex real-world analysis (e.g., in financial forecasting, scientific discovery), yet their reasoning suffers from stochastic instabili…
Towards Automated Ontology Generation from Unstructured Text: A Multi-Agent LLM Approach
Automatically generating formal ontologies from unstructured natural language remains a central challenge in knowledge engineering. While large language models (LLMs) show promise, it remains unclear …
Discovering Agentic Safety Specifications from 1-Bit Danger Signals
Can large language model agents discover hidden safety objectives through experience alone? We introduce EPO-Safe (Experiential Prompt Optimization for Safe Agents), a framework where an LLM iterative…
CAP-CoT: Cycle Adversarial Prompt for Improving Chain of Thoughts in LLM Reasoning
Chain-of-Thought (CoT) prompting has emerged as a simple and effective way to elicit step-by-step solutions from large language models (LLMs). However, CoT reasoning can be unstable across runs on lon…
SoccerRef-Agents: Multi-Agent System for Automated Soccer Refereeing
Refereeing is vital in sports, where fair, accurate, and explainable decisions are fundamental. While intelligent assistant technologies are being widely adopted in soccer refereeing, current AI-assis…
IndustryAssetEQA: A Neurosymbolic Operational Intelligence System for Embodied Question Answering in Industrial Asset Maintenance
Industrial maintenance environments increasingly rely on AI systems to assist operators in understanding asset behavior, diagnosing failures, and evaluating interventions. Although large language mode…
Ulterior Motives: Detecting Misaligned Reasoning in Continuous Thought Models
Chain-of-Thought (CoT) reasoning has emerged as a key technique for eliciting complex reasoning in Large Language Models (LLMs). Although interpretable, its dependence on natural language limits the m…
FinGround: Detecting and Grounding Financial Hallucinations via Atomic Claim Verification
Financial AI systems must produce answers grounded in specific regulatory filings, yet current LLMs fabricate metrics, invent citations, and miscalculate derived quantities. These errors carry direct …
When AI reviews science: Can we trust the referee?
The volume of scientific submissions continues to climb, outpacing the capacity of qualified human referees and stretching editorial timelines. At the same time, modern large language models (LLMs) of…