How I accidentally made the fastest C# CSV parser
2 days ago
- #CSV Parsing
- #Performance Optimization
- #C#
- The author developed a fast C# CSV parser by leveraging UTF-8 encoding properties, where ASCII characters like CSV control characters are single bytes, enabling high-speed scanning.
- Optimizations included using unsafe pointers for bounds-check removal, loop unrolling, and SIMD instructions (SSE2/AVX2) with hardware-accelerated operations like PopCount for significant performance gains.
- The parser splits processing into two parts: fast structural character detection using SIMD masks and deferred value extraction, with custom UTF-8 to UTF-16 conversion for escaped fields.
- For UTF-16 support, the author used AVX2.PackUnsignedSaturate and permutation to efficiently scan UTF-16 data while preserving ASCII compatibility.
- Benchmarks against other libraries show the parser is fastest in 41 out of 70 tests, attributed to UTF-8 processing, smaller code size reducing cache pressure, and avoiding over-optimization pitfalls.
- The library, named FourLambda.Csv, is open-source, and the author plans to release similar high-performance tools for JSON parsing and data correction.