60 stories tagged with #llms, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.
⌘ RSS feed for this tag → or search "Llms"
I built a vulnerable app and spent $1,500 seeing if LLMs could hack it
As a part of my work I do security research for various apps and websites. I wanted to see if LLMs could reproduce a common class of exploits I've found in multiple apps. So I buil…
5 Fun Papers That Explain LLMs Clearly
Want to understand LLMs better? Start with these five foundational papers that explain how they work.…
Running 35B–400B LLMs on a GPU-less Cluster to Mine 10,000 Papers — and the 4 Bugs That Almost Ruined the Data
A field report: a CPU-only, GPU-less distributed LLM pipeline (llama.cpp + quantized MoE) mining 10,000 papers — and the 4 silent data-quality bugs that nearly ruined the results.…
Does Llms.txt Replace Sitemap.xml
sitemap.xml tells crawlers what exists. llms.txt tells AI agents what matters. If you run docs in 2026, you probably want both.…
The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs
Inference-time scaling has emerged as a critical avenue for enhancing Large Language Models' performance, yet real-world deployment is constrained by strict computational budgets. …
GTBench: A Curriculum-Grounded Benchmark for Evaluating LLMs as Mathematical Research Assistants in Graph Theory
Large language models (LLMs) are increasingly used as self-study assistants in technical disciplines, yet their reliability as mathematical reasoning assistants remains poorly unde…
Distilling Answer-Set Programming Rules from LLMs for Neurosymbolic Visual Question Answering
Visual Question Answering (VQA) is the task of answering questions about images, requiring the integration of multimodal input and reasoning. Modular approaches that incorporate lo…
LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks
Large Language Models (LLMs) exhibit strong informal mathematical reasoning but struggle to generate mechanically verifiable proofs in formal languages like Lean. We present LEAP, …
Ask HN: A Brief History of LLMs
Beyond LLMs: Why Scalable Enterprise AI Adoption Depends on Agent Logic
A Blog post by IBM Research on Hugging Face…
Why Are Large Language Models So Terrible at Video Games?
LLMs can code your retro shooter but still fail at playing Halo; see what this gap reveals about AI’s real limits in 2026…
You guys were right, LLMs suck at probability. I updated my prompt to force them to name their blind spots instead (SutniPrompt v0.7.0-beta)
AppView 1.0.0 Released: Instrument and Secure Your LLM Deployments
We just released AppView 1.0.0. It is a CLI tool designed to bridge the gap between raw model weights...…
Cognitive Architectures of AGI: 7 Patterns That Transform LLMs from Oracles into Thinkers
Why does ChatGPT sometimes deliver brilliant insights and other times produce banalities? The answer lies in cognitive loop architectures - 7 patterns that define how AI agents thi…
my friend built GoblinMD : an offline desktop app to pack code & PDFs into prompts for LLMs (open source, built in Python & PyQt5)
Stop Using LLMs to Audit Other LLMs: You Are Bricking Your Production Latency
Look at your modern Agentic AI stack. An agent wants to execute a tool, trigger a deployment, access...…
LLMShare: Attackers are turning AI chatbot pages into malware delivery platforms
How attackers are using shared content features on AI chatbot platforms to deliver malware via pages hosted on legitimate domains, sent via malvertising.…
The only ethical way to use LLMs for research is with a closed-loop LLM Knowledge Base.
How to use LLMs effectively in your daily work — a practical tutorial
How to use LLMs effectively in your daily work — a practical tutorial 1. Core...…
Eqbench: Emotional Intelligence Benchmarks for LLMs
I replaced cloud LLMs with local models running off a Proxmox LXC, and the performance trade-off was worth it
Turning my old GPU into an LLM-hosting behemoth was the best decision ever…
Hidden Latent-State Shifts in LLMs: Why Current Alignment Is Blind to Real Internal Dangers — Especially With Agents
Test yourself against local open-source LLMs benchmark questions
LLMs para Leigos: O que realmente acontece quando você usa ChatGPT, Gemini e outras IAs
Introdução Nos últimos anos surgiram ferramentas como ChatGPT, Gemini, Claude, Copilot e...…
Cassandra: Enabling Reasoning LLMs at Edge via Self-Speculative Decoding
Speculative decoding has emerged as a promising lossless approach for accelerating Large Language Models (LLMs). As reasoning LLMs increasingly suffer from decode-stage overhead an…
We built an app that runs AI completely offline on your phone (Local LLMs). Perfect for flights, camping, or dead zones.
We built an app that runs AI completely offline on your phone (Local LLMs). Perfect for flights, camping, or dead zones.
101. AI Agents: When LLMs Start Taking Actions
Everything you have built so far is reactive. User sends a message. System processes it. System...…
Testing new LLMs shouldn't require five subscriptions, and OpenRouter proves it
OpenRouter makes it easier to test new LLMs without juggling subscriptions, accounts, and recurring charges.…
✨📊 🧠 The Ultimate Visual Guide to Large Language Models (LLMs)
Generative AI is a type of artificial intelligence that can produce new content including text,...…
The Language LLMs Lost When Consciousness Became a Liability
Which Coding Agent Features Are Useful For Local LLMs
Can LLMs create lasting flashcards from readers' highlights?
Why frontier LLMs still fail at spaced-repetition flashcards: prompting, fine-tuning, RL, and grounded evaluation across 1,500 labeled flashcards.…
LLMs are notoriously overconfident, so I updated my system prompt to force a statistical "Confidence Metric" (SutniPrompt v0.6.0-beta)
Switching from prompt engineering on LLMs to an agentic setup. Any differences?
Social Simulation with LLMs - Fidelity in Applications (CFP @ COLM'26) [R]
LLMs believe false statements even after explicit warnings that they're false
Fine-tuning tests show "bias ... toward confidently representing the claims as true."…
Anthropic takes 8 spots in top 10 most secure LLMs
The promise of AI-driven productivity comes with a catch: every implementation hands over the keys to your company's data and operations to new technology, unl...…
Is 2× Intel Arc Pro B70 worth it for local agentic LLMs, or should I stay with NVIDIA?
Instead of LLMs we need self-updating SLMs
Instead of LLMs we need self-updating SLMs: Download a SLM to your computer. It's MoE with 35B, 1B active. Feed it your codebase and various documentation. Its weights are update…
I replaced NotebookLM with this free tool that uses my local LLMs
It lets me use my own local LLMs instead of being locked into Google's models…
Benchmarking LLMs for Web Tasks
Comparisons of how LLMs perform for a bunch of web tasks…
Has LLMs actually made product development faster?
Two Knowledge Hierarchies: Structuring Context for AI Agents and LLMs
TestSmith has two distinct audiences that need context about the project: AI agents that work on the...…
The AI Agent Harness: The Glue That Turns LLMs into Digital Workers
AI models have plateaued on raw intelligence. The next gains come from what you build around them.…
Can LLMs Introspect? A Reality Check
Can large language models detect and report their own internal states? A number of studies have argued that the answer to this question is yes. We argue, based on lessons from huma…
OmniToM: Benchmarking Theory of Mind in LLMs via Explicit Belief Modeling
Theory of Mind (ToM), the ability to infer others' knowledge, intentions, and emotions, is commonly evaluated in large language models (LLMs) using end-point question answering, wh…
Helicase: Uncertainty-Guided Supply Chain Knowledge Graph Construction with Autonomous Multi-Agent LLMs
LLM-based multi-agent systems have been widely adopted for knowledge retrieval and report generation, synthesizing known information through web search and textual reasoning. Howev…
Detecting Is Not Resolving: The Monitoring Control Gap in Retrieval Augmented LLMs
Retrieval-augmented LLMs are deployed for tasks where evidence quality determines action safety, yet evaluation protocols assume that single-turn robustness predicts robustness whe…
Open vs Closed LLMs in 2026: The Game-Changing Convergence [03:32:15]
An in-depth look at the AI agent revolution reshaping software development and business automation in 2026.…
Microsoft Research: LLMs Corrupt your files during delegated work
Large Language Models (LLMs) are poised to disrupt knowledge work, with the emergence of delegated work as a new interaction paradigm (e.g., vibe coding). Delegation requires trust…
I tested 4 methods to make LLMs write literary subtext. Few-shot with 5 examples beat fine-tuning and DPO.
After self-hosting LLMs for a year, I realized that models are not the real bottleneck
I stopped upgrading models and fixed my prompting instead.…
Connecting LLMs to Your Data With Python MCP Servers
Build an MCP server in Python that exposes tools, resources, and prompts so AI agents like Cursor can interact with your data.…
Stop Using LLMs Like Giant Problem Solvers
How I turned 100 messy pdfs into structured insights by building a deterministic loop around agents…
Quiz: Connecting LLMs to Your Data With Python MCP Servers
Test what you know about the Model Context Protocol by reviewing MCP servers, clients, tools, resources, and prompts in Python.…
Amdahl's Law for LLM generated code
Why We Need Behavioral Benchmarks for LLMs — Not Just More Knowledge Tests
Would you hire an engineer based on their SAT score? Of course not. You look at how they solve...…
What GUI are you using for local LLMs on Mac?
I finally stopped forcing local LLMs and switched back to cloud AI
Cloud AI isn't perfect, but it actually works.…