Why Rewriting Emacs Is Hard
6 days ago
- #Emacs
- #Programming
- #Text Processing
- Emacs supports characters up to #x3FFFFF, beyond the Unicode standard #x10FFFF, for lossless file editing and handling non-Unicode encodings.
- Emacs ensures lossless file editing by preserving invalid bytes as raw bytes, allowing exact restoration during saving.
- Emacs reserves code point space for characters not yet unified in Unicode, treating them as normal characters in ELisp.
- Emacs exposes mutable case tables for string transformations, enabling custom case conversions beyond Unicode standards.
- Emacs regexp is specialized, supporting features like cursor position assertions and syntax-aware matching, incompatible with common regexp libraries.
- Emacs buffers are complex, integrating text properties, overlays, markers, and indirect buffers, all synchronized with text edits.
- Different editors use varied buffer implementations (gap buffers, ropes, piece trees) with metadata stored in trees for performance.
- Emacs strings and buffers interchange text properties, with complex handling during multi-byte to single-byte conversions.
- Emacs' redisplay complexity and buffer design impact performance, with challenges in parallelization and structured text representation.