5 stories tagged with #interpretability, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.
⌘ RSS feed for this tag → or search "Interpretability"
Refusal in Language Models Is Mediated by a Single Direction
The 2026 Mandate: From Model Velocity to Algorithmic Governance
For the past decade, the tech industry has been obsessed with velocity. We celebrated the speed of...…
Towards Causally Interpretable Wi-Fi CSI-Based Human Activity Recognition with Discrete Latent Compression and LTL Rule Extraction
We address Human Activity Recognition (HAR) utilizing Wi-Fi Channel State Information (CSI) under the joint requirements of causal interpretability, symbolic controllability, and d…
A Systematic Approach for Large Language Models Debugging
Large language models (LLMs) have become central to modern AI workflows, powering applications from open-ended text generation to complex agent-based reasoning. However, debugging …
Domain-Filtered Knowledge Graphs from Sparse Autoencoder Features
Sparse autoencoders (SAEs) extract millions of interpretable features from a language model, but flat feature inventories aren't very useful on their own. Domain concepts get mixed…