Lexar Wants to Offload Local AI Models to SSD Amid the RAMpocalypse
5 hours ago
- #SSD technology
- #AI hardware
- #memory optimization
- Lexar is developing SSD technology to offload local AI models from DRAM to cheaper NAND Flash to reduce memory costs.
- The Lexar AI Storage Core SSD can cut DRAM requirements by at least 40%, allowing larger LLMs to run on PCs with less RAM.
- In tests, running the Qwen 3.5 122B model required only 32 GB of DRAM instead of 128 GB, with improved token generation speeds compared to traditional methods.
- The technology enables running models with larger context windows, like 256K tokens, where traditional approaches fail, though latency increases with model size.
- A hot-swappable M.2 SSD design with Lexar's custom SPU DRAM-less controller is showcased for Mini-PCs, supporting PCIe Gen 4 and Gen 5 for direct processor connections.
- Challenges include slower time-to-first-token and potential wear on NAND Flash from frequent model updates, with debate over long-term cost savings versus DRAM.