Hasty Briefsbeta

Bilingual

80386 Early Start Memory Access

6 hours ago
  • #FPGA
  • #80386
  • #Performance Optimization
  • Intel's 80386 featured an 'Early Start' mechanism to hide memory latency by initiating the next instruction's address calculations in the last cycle of the current instruction.
  • The z386 FPGA core, after adding Early Start and other optimizations, achieved ao486-class performance, with significant improvements in benchmarks like Doom (16.6 to 23.0 FPS) and 3DBench.
  • Early Start relies on microcode and forwarding networks to handle data hazards, including a corner case that causes the POPAD bug.
  • Implementation of Early Start in z386 involved forwarding logic for registers and stack pointers, computing effective addresses combinationality at the i_pop cycle.
  • Further performance gains came from tightening the store queue, issuing reads/writes earlier, splitting the cache (16KB+16KB), and implementing early branch redirect for direct relative branches.
  • A wider frontend with a 32-byte prefetch queue, single-cycle structural decoder, and improved refill bandwidth reduced decode-queue-empty stalls from ~20% to under 10%.
  • Maintaining a high clock speed (85 MHz) required optimizations like adder carry-chain fusion and timing cleanups without sacrificing CPI improvements.
  • z386 0.4 matches or exceeds ao486 performance, offering an alternative open-source x86 core, though it does not yet boot Windows and benefits from bug reports and community contributions.