Characterizing Realistic Workloads on a Commercial Compute-in-SRAM Device
21 hours ago
- #Compute-in-SRAM
- #Energy Efficiency
- #Hardware Optimization
- Compute-in-SRAM architectures enhance performance and energy efficiency for data-intensive applications.
- Prior evaluations were limited to simulators or small prototypes, lacking real-world insights.
- This work evaluates a commercial compute-in-SRAM device (GSI APU) against CPUs and GPUs.
- An analytical framework is introduced to model performance trade-offs and guide optimizations.
- Three optimizations proposed: communication-aware reduction mapping, coalesced DMA, and broadcast-friendly data layouts.
- Optimizations improve retrieval performance by 4.8×–6.6× over CPUs and end-to-end RAG latency by 1.1×–1.8×.
- The system matches NVIDIA A6000 GPU performance for RAG with 54.4×–117.9× better energy efficiency.
- Findings validate compute-in-SRAM viability for complex applications and provide optimization guidance.