Hasty Briefsbeta

Bilingual

SBCL: The Assembly Code Breadboard (2014)

5 hours ago
  • #performance optimization
  • #machine code
  • #stack VM
  • Correction to NEXT sequence: it previously encoded an effective address incorrectly, but this didn't affect instruction meaning, just caused a wasteful encoding.
  • Small stack sizes, like the F18's 10 slots with no overflow/underflow traps, reflect Chuck Moore's philosophy—if you need more, you're doing it wrong.
  • x87's rotating stack, as highlighted by djb, can reduce data shuffling, making it comparable or superior to registers with careful scheduling.
  • Exploring stack-based VMs with a fixed small stack (e.g., 8 slots) allows for implementation techniques like keeping everything in registers.
  • Pushing/popping with a modular TOS counter avoids data movement, mimicking x87 and F18's approach.
  • Specializing primitives for all 8 stack counter values in stack VMs can improve performance, as primops are already duplicated for speed.
  • Modifications to VM's NEXT sequence: encoding offsets instead of addresses reduces bytecode size, and dispatch uses regular intervals to simplify variant selection.
  • Implementing the VM with SBCL's assembler and SLIME's REPL for generating repetitive machine code efficiently.
  • Control flow primitives for jumps and conditionals (jmp, jnz, jz) enable loops and conditionals in the VM.
  • Immediate values handled via lit, inc, dec primitives, loading data directly from the instruction stream.
  • Performance testing shows loops with fused operations (like djn) achieve near-native speeds, with specialized primops reducing overhead.
  • Specializing primops to a virtual stack pointer is feasible and effective for threaded interpreters, making fixed stack VMs a good runtime IR.