Search: "ai misalignment" — WeSearch Press

4 stories match your query across our 700+ source catalog. Ranked by relevance and recency.

4 results for "ai misalignment"

$2,500 bug bounty for real-world AI misalignment

🤖 Bug bounty for AI misalignment. Submit real-world instances of AI systems behaving contrary to human intent, values, or safety — win up to $2,500. - Hodlatoor/SyntheticOutlaw…

Wed, 29 Apr 2026 05:24:25 GMT · 12 views

ARXIV.ORG

Structural Enforcement of Goal Integrity in AI Agents via Separation-of-Powers Architecture

Recent evidence suggests that frontier AI systems can exhibit agentic misalignment, generating and executing harmful actions derived from internally constructed goals, even without explicit user reque…

Tue, 28 Apr 2026 04:13:21 GMT · 4 views

ARXIV.ORG

Multi-Dimensional Evaluation of Sustainable City Trips with LLM-as-a-Judge and Human-in-the-Loop

Evaluating nuanced conversational travel recommendations is challenging when human annotations are costly and standard metrics ignore stakeholder-centric goals. We study LLMs-as-Judges for sustainable…

Tue, 28 Apr 2026 04:13:21 GMT · 4 views

ARXIV.ORG

Ulterior Motives: Detecting Misaligned Reasoning in Continuous Thought Models

Chain-of-Thought (CoT) reasoning has emerged as a key technique for eliciting complex reasoning in Large Language Models (LLMs). Although interpretable, its dependence on natural language limits the m…

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

Or browse by topic

World US Politics Technology AI Markets Business Science Climate Health Culture Media

Results for "ai misalignment".

$2,500 bug bounty for real-world AI misalignment

Structural Enforcement of Goal Integrity in AI Agents via Separation-of-Powers Architecture

Multi-Dimensional Evaluation of Sustainable City Trips with LLM-as-a-Judge and Human-in-the-Loop

Ulterior Motives: Detecting Misaligned Reasoning in Continuous Thought Models

Or browse by topic