PitchBench: Measuring Pitch Hearing in Audio-Language Models

May 27, 2026 · 4:00 AM UTC ·3 min read · 0 reactions · 0 comments · 31 views

#audio #artificial intelligence #music #evaluation #pitch

TL;DR · WeSearch summary

The article introduces PitchBench, a new evaluation suite designed to measure pitch hearing in audio-language models (ALMs). It highlights the importance of reliable musical perception for ALMs in various applications, such as music tutoring and transcription. The findings indicate that current ALMs struggle with stable pitch perception across different conditions.

Key facts

▪PitchBench comprises 28 experiments that assess absolute and relative pitch perception under varying acoustic conditions.
▪Current evaluations of ALMs often fail to directly measure pitch hearing, relying instead on higher-level tasks.
▪The study reveals that ALMs perform poorly in pitch perception tasks, with accuracy varying significantly based on sound source and note duration.

Original article

arXiv cs.AI

Read full at arXiv cs.AI →

Opening excerpt (first ~120 words) tap to expand

Computer Science > Sound arXiv:2605.26176 (cs) [Submitted on 25 May 2026] Title:PitchBench: Measuring Pitch Hearing in Audio-Language Models Authors:Milan Liessens Dujardin, Song-Ze Yu, Craver Corbyn Thomas-Smith, David M. Chan, Karina Nguyen View a PDF of the paper titled PitchBench: Measuring Pitch Hearing in Audio-Language Models, by Milan Liessens Dujardin and 4 other authors View PDF HTML (experimental) Abstract:Audio-language models (ALMs) are increasingly used in real-world applications that require understanding music, from music tutoring and transcription to captioning, recommendation systems, and music production. More broadly, they are becoming an important component of multimodal AI systems that must reason from sensory input rather than text alone.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed

Discussion

0 comments

PitchBench: Measuring Pitch Hearing in Audio-Language Models

Discussion

More from arXiv cs.AI