Hasty Briefsbeta

Bilingual

How I accidentally made the fastest C# CSV parser

2 days ago
  • #CSV Parsing
  • #Performance Optimization
  • #C#
  • The author developed a fast C# CSV parser by leveraging UTF-8 encoding properties, where ASCII characters like CSV control characters are single bytes, enabling high-speed scanning.
  • Optimizations included using unsafe pointers for bounds-check removal, loop unrolling, and SIMD instructions (SSE2/AVX2) with hardware-accelerated operations like PopCount for significant performance gains.
  • The parser splits processing into two parts: fast structural character detection using SIMD masks and deferred value extraction, with custom UTF-8 to UTF-16 conversion for escaped fields.
  • For UTF-16 support, the author used AVX2.PackUnsignedSaturate and permutation to efficiently scan UTF-16 data while preserving ASCII compatibility.
  • Benchmarks against other libraries show the parser is fastest in 41 out of 70 tests, attributed to UTF-8 processing, smaller code size reducing cache pressure, and avoiding over-optimization pitfalls.
  • The library, named FourLambda.Csv, is open-source, and the author plans to release similar high-performance tools for JSON parsing and data correction.