Jan 15, 2021
I'm a big fan of UTF-8, because it gives me the ability to do so many things. Your criticisms don't hit the mark for me. For example, the ternary-trie search (https://github.com/readwritetools/ternwords) fully implements word look-ahead on the entire UTF-8 character set.
For an in-the-wild demonstration, try clicking on the 🔎 button on https://readwritestack.com/components/search.blue
- Search for words beginning with š (a diacritic not typically included in codepages) and you'll discover šefik.
- Search for words beginning with γ (a Greek letter) and you'll discover γεωγραφία.
- Search for Hebrew words beginning with ט״ (remember that Hebrew is a right-to-left language) and you'll discover ט״ו and ט״ז.
- Search for Japanese words beginning with 漢 (this is a multi-byte character) and you'll discover 漢字.