#model-evaluation — Tagged Stories

Every story in the WeSearch catalog tagged with #model-evaluation, chronological, with view counts. Subscribe to the per-tag RSS feed to follow this topic in your reader of choice.

7 stories tagged with #model-evaluation, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.

⌘ RSS feed for this tag → or search "Model Evaluation"

RELATED TAGS

#ai4 #ml2 #bias1 #quantization1 #behavioral-specifications1 #security1

JULIAHUB

Frontier model evaluation for Physical AI

We tested the latest frontier models in the Dyad agent on five modeling and simulation problems, comparing accuracy, cost, time, and work style.…

11 views · Thu, 23 Jul 2026 14:30:37 GMT

#frontier #model #evaluation

CISCO BLOGS

Multi-turn jailbreak rates across 15 frontier models (Grok 88%, Claude 12%)

The dominant safety benchmarks for frontier large language models share a structural assumption: that a single prompt and a single model response are enough to characterize how a m…

31 views · Wed, 27 May 2026 22:23:18 GMT

#artificial intelligence #security

ARXIV CS.AI

How Well Do Models Follow Their Constitutions?

Frontier AI developers now train models against long written behavioral specifications, such as Anthropic's constitution (Anthropic, 2025a) and OpenAI's Model Spec (OpenAI, 2025a),…

30 views · Tue, 26 May 2026 04:00:00 GMT

#artificial intelligence #behavioral specifications

DEV.TO (TOP)

Building an AI Model Evaluation Pipeline on AWS for Audio Content Generation

Executive Summary A European digital media publisher needed to determine which foundation...…

20 views · Fri, 22 May 2026 10:47:49 GMT

#aws #ai #media

DEV.TO (TOP)

Building a Serverless AI Model Evaluation Platform on AWS

The Problem A media company needed to evaluate which AI model produces the best...…

31 views · Fri, 22 May 2026 07:23:38 GMT

#ai #aws #serverless

ARXIV CS.AI

The Growing Pains of Frontier Models: When Leaderboards Stop Separating and What to Measure Next

Leaderboards rank frontier models on independent axes but do not reveal whether capabilities reinforce or trade off across releases -- and at the frontier, this interaction is the …

35 views · Wed, 20 May 2026 04:00:00 GMT

#machine learning #artificial intelligence

ARXIV CS.AI

Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels

Large Language Models are routinely compressed via post-training quantization to reduce inference costs and memory footprint for cloud and edge deployment, yet the impact of this c…

33 views · Mon, 18 May 2026 04:00:00 GMT

#machine learning #artificial intelligence #bias

Browse more

All tags Search "Model Evaluation" RSS feed World US Technology Markets

Model Evaluation coverage.

Frontier model evaluation for Physical AI

Multi-turn jailbreak rates across 15 frontier models (Grok 88%, Claude 12%)

How Well Do Models Follow Their Constitutions?

Building an AI Model Evaluation Pipeline on AWS for Audio Content Generation

Building a Serverless AI Model Evaluation Platform on AWS

The Growing Pains of Frontier Models: When Leaderboards Stop Separating and What to Measure Next

Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels

Browse more