Will We Run Out of Unicodes?
The article discusses the evolution of Unicode from its initial 16-bit design to accommodate the growing need for more characters. Initially thought to be sufficient with 65,536 code points, Unicode has expanded to allow for over a million characters. The potential for future limitations raises questions about the sustainability of character encoding systems.
- ▪Unicode was created to unify various character encoding systems into a single standard for all human languages.
- ▪The original 16-bit Unicode was expanded to UTF-16, allowing for over a million possible code points.
- ▪As linguistic diversity and new symbols emerged, the Unicode Consortium recognized the need for further expansion beyond the initial limits.
Opening excerpt (first ~120 words) tap to expand
Posted on November 10, 2025May 29, 2026 by billpgWill We Run Out of Unicodes? Before Unicode, digital text lived in a fragmented world of 8-bit encodings. ASCII had settled in as the good-enough-for-English core, taking up the first half of codes, but the other half was a mish-mash of regional code pages that mapped characters differently depending on locale. One set for accented Latin letters, another set for Cyrillic. Each system carried its own assumptions, collisions, and blind spots. Unicode emerged as a unifying vision. a single character set for all human languages, built on a 16-bit foundation. All developers had to do was swap their 8-bit loops for 16-bit loops. Some bristled that half the bytes were all zeros, but this was for the greater good. 16-bits made 65,536 code points.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Billpg.