Hasty Briefsbeta

Characterizing Realistic Workloads on a Commercial Compute-in-SRAM Device

19 hours ago
  • #Compute-in-SRAM
  • #Energy Efficiency
  • #Hardware Optimization
  • Compute-in-SRAM architectures enhance performance and energy efficiency for data-intensive applications.
  • Prior evaluations were limited to simulators or small prototypes, lacking real-world insights.
  • This work evaluates a commercial compute-in-SRAM device (GSI APU) against CPUs and GPUs.
  • An analytical framework is introduced to model performance trade-offs and guide optimizations.
  • Three optimizations proposed: communication-aware reduction mapping, coalesced DMA, and broadcast-friendly data layouts.
  • Optimizations improve retrieval performance by 4.8×–6.6× over CPUs and end-to-end RAG latency by 1.1×–1.8×.
  • The system matches NVIDIA A6000 GPU performance for RAG with 54.4×–117.9× better energy efficiency.
  • Findings validate compute-in-SRAM viability for complex applications and provide optimization guidance.