Hasty Briefsbeta

Disassembling terabytes of random data with Zig and Capstone to prove a point

14 days ago
  • #Zig
  • #DEFLATE
  • #ARM
  • The article discusses a disagreement about the likelihood of finding ARM (Thumb mode) instructions versus DEFLATE-compressed Thumb instructions in random byte streams.
  • The author argues that random bytes are more likely to contain valid Thumb instructions than valid DEFLATE streams that inflate to Thumb instructions, due to Thumb's high code density.
  • Experimental results show that successful disassembly is over 125x more common than successful decompression, and over 350x more common than decompression followed by disassembly.
  • About 89.3% of 2-byte sequences and 85.5% of 4-byte sequences disassemble as valid Thumb instructions, indicating high code density.
  • DEFLATE decompression fails primarily due to invalid headers or Huffman tree errors, with only a 0.5% success rate for random bytes.
  • The author used Zig and Capstone to perform Monte Carlo simulations, demonstrating that Thumb instructions are more likely to appear in random bytes than compressed data.
  • The article concludes that Thumb instructions have very high code density, making them more likely to occur in random byte sequences than compressed data containing Thumb instructions.