C++26: Cleaning up string literals
a day ago
- #compiler-behavior
- #string-literals
- #C++26
- P2361R6 introduces the concept of unevaluated strings in C++26, disallowing encoding prefixes and restricting escape sequences to only universal-character-names and simple-escape-sequences (except \0), while numeric escapes become ill-formed, as these strings are not converted to execution encoding.
- P1854R4 makes non-encodable characters in evaluated string literals ill-formed to prevent silent corruption, replacing previous implementation-defined behavior where unrepresentable characters could be silently substituted (e.g., with ? in MSVC).
- Both proposals aim to improve predictability: P2361R6 ensures unevaluated strings (e.g., in static_assert, attributes) are handled consistently by compilers without unnecessary encoding, while P1854R4 enhances safety by requiring compilers to error on characters that can't be represented in the literal's associated encoding, promoting the use of u8 for UTF-8 strings.
- The changes are part of a broader effort to make C++ lexing less surprising, with minimal real-world code impact according to surveys, and they allow for future extensions like constant expressions in static_assert without breaking existing functionality.