Redundant Information in LLM Weights
Large language model (LLM) weights stored in formats like bfloat16 may contain redundant information, as empirical analysis shows significant entropy gaps. Shannon entropy measurements reveal that BF16 weights carry about 10.6 bits of information per parameter, leaving roughly a third of the allocated bits unused. This suggests potential for more efficient weight representation without losing meaningful information.
- ▪BF16 weights carry approximately 10.6 bits of entropy per element, despite using 16 bits for storage.
- ▪The gap between allocated bits and actual entropy represents wasted capacity in current LLM weight formats.
- ▪Weight distributions across various models and scales were analyzed to measure information content using Shannon entropy.
Opening excerpt (first ~120 words) tap to expand
On This PageHow?Baseline: 16 bits per weightWhy the exponent?Half the bits: 8 bits per weightBelow the byte floor: 4 bits per weightWhat’s left?Footnotes In search of wasted bits: how much information do LLM weights carry? 5 May 2026 11 min read If you store a model’s weights in bfloat16, each parameter gets 16 bits. That’s the budget. The question is whether we’re spending it well. Information theory gives us a clean way to ask this. Shannon entropy measures the average information content per symbol in a stream of data. If every possible byte value appears equally often, entropy is maximal and there’s nothing to squeeze out. If certain values dominate, entropy drops below the bit-width, and the difference is waste: bits allocated but carrying no information.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Fergusfinn.