My Favorite Bugs: Invalid Surrogate Pairs
4 hours ago
- #Unicode
- #JavaScript
- #Debugging
- The author's favorite bug involved invalid surrogate pairs causing silent sync failures in a collaborative editor.
- The bug was triggered by specific edits like inserting an emoji next to another, which split surrogate pairs.
- Debugging revealed it was due to JavaScript's string methods operating on code units, not code points or graphemes.
- The issue occurred in lib0's splice method using .slice(), leading to orphaned surrogates and URI errors.
- A temporary fix included an error listener and making emoji atomic nodes, while lib0 was eventually patched.
- The modern solution is using Intl.Segmenter for grapheme-aware string manipulation to avoid such bugs.
- This bug highlights the pitfalls of UTF-16 in JavaScript and how Unicode complexities can break applications.