WeSearch
Hub / Tags / Model Evaluation
TAG · #MODEL-EVALUATION

Model Evaluation coverage.

Every story in the WeSearch catalog tagged with #model-evaluation, chronological, with view counts. Subscribe to the per-tag RSS feed to follow this topic in your reader of choice.

6 stories tagged with #model-evaluation, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.

⌘ RSS feed for this tag →   or   search "Model Evaluation"

RELATED TAGS
#ai4#ml2#bias1#quantization1#behavioral-specifications1#security1
CISCO BLOGS

Multi-turn jailbreak rates across 15 frontier models (Grok 88%, Claude 12%)

The dominant safety benchmarks for frontier large language models share a structural assumption: that a single prompt and a single model response are enough to characterize how a m…

15 views ·
#artificial intelligence#security
ARXIV CS.AI

How Well Do Models Follow Their Constitutions?

Frontier AI developers now train models against long written behavioral specifications, such as Anthropic's constitution (Anthropic, 2025a) and OpenAI's Model Spec (OpenAI, 2025a),…

13 views ·
#artificial intelligence#behavioral specifications
DEV.TO (TOP)

Building an AI Model Evaluation Pipeline on AWS for Audio Content Generation

Executive Summary A European digital media publisher needed to determine which foundation...…

11 views ·
#aws#ai#media
DEV.TO (TOP)

Building a Serverless AI Model Evaluation Platform on AWS

The Problem A media company needed to evaluate which AI model produces the best...…

13 views ·
#ai#aws#serverless
ARXIV CS.AI

The Growing Pains of Frontier Models: When Leaderboards Stop Separating and What to Measure Next

Leaderboards rank frontier models on independent axes but do not reveal whether capabilities reinforce or trade off across releases -- and at the frontier, this interaction is the …

12 views ·
#machine learning#artificial intelligence
ARXIV CS.AI

Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels

Large Language Models are routinely compressed via post-training quantization to reduce inference costs and memory footprint for cloud and edge deployment, yet the impact of this c…

13 views ·
#machine learning#artificial intelligence#bias