28 results for "system prompt"
LLM Prompt Engineering in Practice: CoT, Few-Shot, and System Prompt Design
LLM Prompt Engineering in Practice: CoT, Few-Shot, and System Prompt Design Stop getting...…
[Open Source] 1,446 trending AI image prompts for GPT Image 2 & NanoBanana, system prompt & MCP included
Claude system prompt bug wastes user money and bricks managed agents
Regression summary Issue #47027 was closed by @bcherny in February saying "This was fixed in v2.1.92." I'm running v2.1.111 (19 versions past the fix) and the exact same behavior reproduces reliabl...…
Claude Leak Confirms It: LLM Systems Are Architecture, Not Prompts (Orca)
Agents should execute whenever possible — runtime for composable AI agent skills - gfernandf/agent-skills…
How do you handle RP-style prompts (actions + dialogue) in LLM systems?
A Systematic Approach for Large Language Models Debugging
Large language models (LLMs) have become central to modern AI workflows, powering applications from open-ended text generation to complex agent-based reasoning. However, debugging these models remains…
A systematic evaluation of vision-language models for observational astronomical reasoning tasks
Vision-language models (VLMs) are increasingly proposed as general-purpose tools for scientific data interpretation, yet their reliability on real astronomical observations across diverse modalities r…
I built Claude Code skills for writing agent prompts, grounded in prompt research
I've been building agentic systems for a while and wanted a more systematic approach to writing prompts. So I gathered papers, did some deep research and created guides on structure, format and prompt…
OpenAI Agents SDK Tutorial: Build Multi-Agent AI Systems in Python (2025)
How to move beyond single-prompt chatbots and create AI workflows that plan, collaborate, and get things done — with working code you can run today.…
OpenGame: Open Agentic Coding for Games
Game development sits at the intersection of creative design and intricate software engineering, demanding the joint orchestration of game engines, real-time loops, and tightly coupled state across ma…
The Randomness Floor: Measuring Intrinsic Non-Randomness in Language Model Token Distributions
Language models cannot be random. This paper introduces Entropic Deviation (ED), the normalised KL divergence between a model's token distribution and the uniform distribution, and measures it systema…
RCSB PDB AI Help Desk: retrieval-augmented generation for protein structure deposition support
Motivation: Structural Biologists have contributed more than 245,000 experimentally determined three-dimensional structures of biological macromolecules to the Protein Data Bank (PDB). Incoming data a…
See No Evil: Semantic Context-Aware Privacy Risk Detection for AR
Augmented reality (AR) systems pose unique privacy risks due to their continuous capture of visual data. Existing AR privacy frameworks lack semantic understanding of visual content, limiting their ef…
Quoting OpenAI Codex base_instructions
Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query. — OpenAI Codex base_instruct…
why does GPT 5.5 have a restraining order against "Raccoons," "Goblins," and "Pigeons"?
why does GPT 5.5 have a restraining order against \"Raccoons,\" \"Goblins,\" and \"Pigeons\"? I just saw the full system prompt leak for 5.5 (April 23rd release). Most of it is standard agentic stuff,…
I built a structured context layer for AI coding agents so they stop generating the wrong UI
Most AI coding agents fail at UI not because they can't write code, but because they have no context about what you actually want. You ask Claude Code or Cursor to "build a hero section" and you get s…
I got 3× faster HFQ4 prefill on Strix Halo in hipfire with an opt-in MMQ path
I recently contributed an experimental HFQ4-G256 MMQ prefill path to hipfire, an RDNA-focused LLM inference engine. Disclaimer: I authored the PR, so this is partly a contribution note, but I am mainl…
FormalScience: Scalable Human-in-the-Loop Autoformalisation of Science with Agentic Code Generation in Lean
Formalising informal mathematical reasoning into formally verifiable code is a significant challenge for large language models. In scientific fields such as physics, domain-specific machinery (\textit…
ArguAgent: AI-Supported Real-Time Grouping for Productive Argumentation in STEM Classrooms
Argumentation is a core practice in STEM education, but its productivity depends on who participates and how they interact. Higher-achieving students often dominate the talk and decision-making, while…
Agentic Adversarial Rewriting Exposes Architectural Vulnerabilities in Black-Box NLP Pipelines
Multi-component natural language processing (NLP) pipelines are increasingly deployed for high-stakes decisions, yet no existing adversarial method can test their robustness under realistic conditions…
When AI reviews science: Can we trust the referee?
The volume of scientific submissions continues to climb, outpacing the capacity of qualified human referees and stretching editorial timelines. At the same time, modern large language models (LLMs) of…
Structural Enforcement of Goal Integrity in AI Agents via Separation-of-Powers Architecture
Recent evidence suggests that frontier AI systems can exhibit agentic misalignment, generating and executing harmful actions derived from internally constructed goals, even without explicit user reque…
GAMED.AI: A Hierarchical Multi-Agent Framework for Automated Educational Game Generation
We introduce GameDAI, a hierarchical multi-agent framework that transforms instructor-provided questions into fully playable, pedagogically grounded educational games validated through formal mechanic…
Multi-Dimensional Evaluation of Sustainable City Trips with LLM-as-a-Judge and Human-in-the-Loop
Evaluating nuanced conversational travel recommendations is challenging when human annotations are costly and standard metrics ignore stakeholder-centric goals. We study LLMs-as-Judges for sustainable…
I think I’m using ChatGPT wrong
I think I’m using ChatGPT wrong, and it’s becoming increasingly difficult to find a place for it in my workflow. I’ve been a Plus subscriber since day one, but ever since the release of the GPT-5s, I’…
ChatGPT-psychosis: How it can occur and how to avoid it.
Hey everyone, If there are AI developers, prompt engineers, or system architects here, this is especially for you. You should really take this into account. We have all seen the reports about the nega…
Three things I've measured about Claude's behavior in long sessions — with reproducible test cases
Running production Claude agents for 35 days. Some behavioral patterns I've confirmed with reproducible tests: **Pattern 1: Constraint adherence weakens at high token depth*\ * Test: System: "Always r…
We built an open-source proxy that enforces rules for GPT agents at the API layer - 700 stars
If you've built anything on top of the OpenAI API and tried to enforce business rules via system prompts, you know the frustration: the model sometimes just ignores them. We built Caliber to tackle th…