Sparse Autoencoders Reveal Cortical Brain-LLM Semantic Mapping

Let's Data Science· May 26, 2026 · 10:35 AM UTC ·3 min read · 0 reactions · 0 comments · 41 views

#neuroscience #machine learning #language models

TL;DR · WeSearch summary

A recent preprint explores the connection between large language models and human brain semantics using sparse autoencoders. The study demonstrates that these autoencoders can extract interpretable features from models like GPT-2 and Llama-3, achieving significant alignment with neural encoding performance. Findings suggest that this approach could enhance our understanding of cognitive neuroscience and model interpretability.

Key facts

▪The preprint presents a mechanistic interpretability approach linking large language model representations to human cortical semantic organization.
▪Sparse autoencoders were used to decompose GPT-2 XL and Llama-3.1-8B into 16K-32K interpretable features per layer.
▪The authors report that semantic features alone recover 94% of peak neural encoding performance, outperforming variance-matched baselines.

Original article

Let's Data Science · Let's Data Science

Read full at Let's Data Science →

Opening excerpt (first ~120 words) tap to expand

Models & Researchsparse autoencodersbrain llm alignmentcomputational neurolinguisticsgpt 2Sparse Autoencoders Reveal Cortical Brain-LLM Semantic Mapping2 sources|May 25, 20267.0Relevance ScorePhoto: arxiv.org · rights & takedownsQuick SummaryHideA preprint submitted to arXiv (arXiv:2605.23035) by Dongxin Guo and colleagues presents a mechanistic interpretability approach connecting large language model representations to human cortical semantic organization. According to the arXiv preprint and the CoNLL openreview entry, the authors use sparse autoencoders (SAEs) to decompose GPT-2 XL and Llama-3.1-8B into 16K-32K interpretable features per layer.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at Let's Data Science.

Anonymous · no account needed

Discussion

0 comments

Sparse Autoencoders Reveal Cortical Brain-LLM Semantic Mapping

Discussion

More from Let's Data Science