Byte Type: Supporting Raw Data Copies in the LLVM IR
a day ago
- #Memory Management
- #LLVM
- #Compiler Optimization
- The project aimed to add a new byte type to LLVM IR to represent raw memory values, enabling native implementation of memory-related intrinsics like memcpy, memmove, and memcmp.
- The byte type addresses two core problems: integers not tracking pointer provenance and spreading poison values when loading memory through integer types.
- Pointer provenance is crucial for alias analysis, and the byte type ensures that loading pointers retains their provenance, unlike integer types which discard it.
- The byte type provides bit-level poison representation, allowing individual poison bits to be tracked, unlike integer types which taint the entire value if any poison bit is present.
- Implementation included porting a previous GSoC prototype, adapting it for opaque pointers, and refining the proposal iteratively as challenges emerged.
- The byte type is a first-class single-value type with the same size and alignment as equivalent integer types, capable of representing both pointer and non-pointer values.
- New instructions like bytecast were introduced to reinterpret byte values as other primitive types, with options to allow or disallow type punning.
- Optimizations such as memcmp lowering, load widening, and value coercion were reworked to use the byte type, fixing previously unsound transformations.
- Clang was modified to lower C/C++ raw memory access types (char, signed char, unsigned char, std::byte) to the byte type, updating code generation to handle these changes.
- Benchmarks showed minimal performance impact, with minor changes in compile time, object size, and run-time performance across various applications.
- Future work includes addressing remaining failing Clang regression tests, extending support to other architectures, and documenting the byte type in the Language Reference.
- The project concluded successfully, solving a long-standing LLVM problem with minimal performance overhead and enabling safer, more accurate memory optimizations.