SBCL: The Assembly Code Breadboard (2014)
6 hours ago
- #performance optimization
- #machine code
- #stack VM
- Correction to NEXT sequence: it previously encoded an effective address incorrectly, but this didn't affect instruction meaning, just caused a wasteful encoding.
- Small stack sizes, like the F18's 10 slots with no overflow/underflow traps, reflect Chuck Moore's philosophy—if you need more, you're doing it wrong.
- x87's rotating stack, as highlighted by djb, can reduce data shuffling, making it comparable or superior to registers with careful scheduling.
- Exploring stack-based VMs with a fixed small stack (e.g., 8 slots) allows for implementation techniques like keeping everything in registers.
- Pushing/popping with a modular TOS counter avoids data movement, mimicking x87 and F18's approach.
- Specializing primitives for all 8 stack counter values in stack VMs can improve performance, as primops are already duplicated for speed.
- Modifications to VM's NEXT sequence: encoding offsets instead of addresses reduces bytecode size, and dispatch uses regular intervals to simplify variant selection.
- Implementing the VM with SBCL's assembler and SLIME's REPL for generating repetitive machine code efficiently.
- Control flow primitives for jumps and conditionals (jmp, jnz, jz) enable loops and conditionals in the VM.
- Immediate values handled via lit, inc, dec primitives, loading data directly from the instruction stream.
- Performance testing shows loops with fused operations (like djn) achieve near-native speeds, with specialized primops reducing overhead.
- Specializing primops to a virtual stack pointer is feasible and effective for threaded interpreters, making fixed stack VMs a good runtime IR.