How I accidentally made the fastest C# CSV parser

2 days ago

The author developed a fast C# CSV parser by leveraging UTF-8 encoding properties, where ASCII characters like CSV control characters are single bytes, enabling high-speed scanning.
Optimizations included using unsafe pointers for bounds-check removal, loop unrolling, and SIMD instructions (SSE2/AVX2) with hardware-accelerated operations like PopCount for significant performance gains.
The parser splits processing into two parts: fast structural character detection using SIMD masks and deferred value extraction, with custom UTF-8 to UTF-16 conversion for escaped fields.
For UTF-16 support, the author used AVX2.PackUnsignedSaturate and permutation to efficiently scan UTF-16 data while preserving ASCII compatibility.
Benchmarks against other libraries show the parser is fastest in 41 out of 70 tests, attributed to UTF-8 processing, smaller code size reducing cache pressure, and avoiding over-optimization pitfalls.
The library, named FourLambda.Csv, is open-source, and the author plans to release similar high-performance tools for JSON parsing and data correction.

Hasty Briefsbeta