30 results for "llm"
Arc Gate — LLM proxy that catches 100% of indirect/roleplay prompt injection attacks (beats OpenAI Moderation and LlamaGuard)
Built an LLM proxy that sits in front of any OpenAI-compatible endpoint and blocks prompt injection before it reaches your model. Benchmarked against OpenAI Moderation API and LlamaGuard 3 8B on 40 ou…
Talkie: a 13B LLM trained only on pre-1931 text used Claude Sonnet to help test the model and judge its output
Researchers Alec Radford (GPT, CLIP, Whisper), Nick Levine, and David Duvenaud just released talkie : a 13 billion parameter language model trained exclusively on text published before 1931. No intern…
Understanding the LLM Bubble
If there is no path to superintelligence by 2028, and there is little prospect of the dramatic product improvements needed to drive major short-term revenue growth (including solutions to inaccuracy a…
LLM Prompt Engineering in Practice: CoT, Few-Shot, and System Prompt Design
LLM Prompt Engineering in Practice: CoT, Few-Shot, and System Prompt Design Stop getting...…
Hillman Solutions Corp. 2026 Q1 - Results - Earnings Call Presentation
2026-04-28. The following slide deck was published by Hillman Solutions Corp.…
LLMs Corrupt Your Documents When You Delegate
Large Language Models (LLMs) are poised to disrupt knowledge work, with the emergence of delegated work as a new interaction paradigm (e.g., vibe coding). Delegation requires trust - the expectation t…
Sage-Wiki: An LLM-compiled personal knowledge base
An LLM-compiled personal knowledge base. Drop in your papers, articles, and notes. sage-wiki compiles them into a structured, interlinked wiki — with concepts extracted, cross-references discovere...…
Yann LeCun: LLMs Are Nearing the End, but Better AI Is Coming (2025)
Yann LeCun, Chief AI Scientist at Meta, believes LLMs are doomed due to their inability to represent the high-dimensional spaces that characterize our world…
We Fixed Karpathy’s LLM Wiki - PENgram Is the Typed Knowledge Graph Pipeline Everyone Asked For
We recently published an article about the gaps in Karpathy's LLM Wiki pattern. The thesis was...…
Centene’s Obamacare Enrollment Drops By 2 Million After Congress Strips Subsidies
Health insurer Centene reported first quarter net income of $1.5 billion despite a 2 million enrollee drop in Obamacare enrollment.…
Show HN: Waiting for LLMs Suck – Give your user a game
Give your user a game while they wait for the LLM to return a result.…
Don't Make the LLM Read the Graph: Make the Graph Think
We investigate whether explicit belief graphs improve LLM performance in cooperative multi-agent reasoning. Through 3,000+ controlled trials across four LLM families in the cooperative card game Hanab…
Analytica: Soft Propositional Reasoning for Robust and Scalable LLM-Driven Analysis
Large language model (LLM) agents are increasingly tasked with complex real-world analysis (e.g., in financial forecasting, scientific discovery), yet their reasoning suffers from stochastic instabili…
Towards Automated Ontology Generation from Unstructured Text: A Multi-Agent LLM Approach
Automatically generating formal ontologies from unstructured natural language remains a central challenge in knowledge engineering. While large language models (LLMs) show promise, it remains unclear …
PhySE: A Psychological Framework for Real-Time AR-LLM Social Engineering Attacks
The emerging threat of AR-LLM-based Social Engineering (AR-LLM-SE) attacks (e.g. SEAR) poses a significant risk to real-world social interactions. In such an attack, a malicious actor uses Augmented R…
Judging the Judges: A Systematic Evaluation of Bias Mitigation Strategies in LLM-as-a-Judge Pipelines
LLM-as-a-Judge has become the dominant paradigm for evaluating language model outputs, yet LLM judges exhibit systematic biases that compromise evaluation reliability. We present a comprehensive empir…
CAP-CoT: Cycle Adversarial Prompt for Improving Chain of Thoughts in LLM Reasoning
Chain-of-Thought (CoT) prompting has emerged as a simple and effective way to elicit step-by-step solutions from large language models (LLMs). However, CoT reasoning can be unstable across runs on lon…
LEGO: An LLM Skill-Based Front-End Design Generation Platform
Existing LLM-based EDA agents are often isolated task-specific systems. This leads to repeated engineering effort and limited reuse of successful design and debugging strategies. We present LEGO, a un…
GSAR: Typed Grounding for Hallucination Detection and Recovery in Multi-Agent LLMs
Autonomous multi-agent LLM systems are increasingly deployed to investigate operational incidents and produce structured diagnostic reports. Their trustworthiness hinges on whether each claim is groun…
Expert Evaluation of LLM's Open-Ended Legal Reasoning on the Japanese Bar Exam Writing Task
Large language models (LLMs) have shown strong performance on legal benchmarks, including multiple-choice components of bar exams. However, their capacity for generating open-ended legal reasoning in …
ClawTrace: Cost-Aware Tracing for LLM Agent Skill Distillation
Skill-distillation pipelines learn reusable rules from LLM agent trajectories, but they lack a key signal: how much each step costs. Without per-step cost, a pipeline cannot distinguish adding a missi…
LLM-Augmented Traffic Signal Control with LSTM-Based Traffic State Prediction and Safety-Constrained Decision Support
Traffic signal control is a critical task in intelligent transportation systems, yet conventional fixed-time and rule-based methods often struggle to adapt to dynamic traffic demand and provide limite…
Context-Aware Hospitalization Forecasting Evaluations for Decision Support using LLMs
Medical and public health experts must make real-time resource decisions, such as expanding hospital bed capacity, based on projected hospitalization trends during large-scale healthcare disruptions (…
LLM-Guided Agentic Floor Plan Parsing for Accessible Indoor Navigation of Blind and Low-Vision People
Indoor navigation remains a critical accessibility challenge for the blind and low-vision (BLV) individuals, as existing solutions rely on costly per-building infrastructure. We present an agentic fra…
Disaggregated Serving for Hybrid SSM Models in vLLM
LLMs Can't Generate Influence
Are people using AI/LLMs in Defense or Secure Environments?
Monitoring LLM behavior: Drift, retries, and refusal patterns
Tool for inline annotation of LLM-generated specs and prompts (works with any MCP client)
I'm a product manager and spend a lot of time iterating on long prompts and specs that AI agents then act on. The review loop has been the worst part. When the model gives me a 5-page draft, leaving f…