A Negative Result on Cross-Model Activation Transfer in a Pythia Multi-Hop Setting
The paper presents a negative result regarding cross-model activation transfer in a multi-hop reasoning setting using Pythia models. Despite achieving a high normalized cosine similarity between hidden states, the injected activations did not enhance downstream performance. This indicates that representational alignment alone is insufficient for effective communication between models during inference.
- ▪The study investigates the potential for one language model to communicate intermediate reasoning states to another model.
- ▪A linear translation layer was able to map hidden states with high normalized cosine similarity.
- ▪However, injecting these translated activations did not improve performance, remaining close to the baseline.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2606.03280 (cs) [Submitted on 2 Jun 2026] Title:A Negative Result on Cross-Model Activation Transfer in a Pythia Multi-Hop Setting Authors:Peiyan Zhang View a PDF of the paper titled A Negative Result on Cross-Model Activation Transfer in a Pythia Multi-Hop Setting, by Peiyan Zhang View PDF HTML (experimental) Abstract:Recent work shows that language models can transmit behavioural traits through hidden signals in generated data during training.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.