Lexar Wants to Offload Local AI Models to SSD Amid the RAMpocalypse

5 hours ago

Lexar is developing SSD technology to offload local AI models from DRAM to cheaper NAND Flash to reduce memory costs.
The Lexar AI Storage Core SSD can cut DRAM requirements by at least 40%, allowing larger LLMs to run on PCs with less RAM.
In tests, running the Qwen 3.5 122B model required only 32 GB of DRAM instead of 128 GB, with improved token generation speeds compared to traditional methods.
The technology enables running models with larger context windows, like 256K tokens, where traditional approaches fail, though latency increases with model size.
A hot-swappable M.2 SSD design with Lexar's custom SPU DRAM-less controller is showcased for Mini-PCs, supporting PCIe Gen 4 and Gen 5 for direct processor connections.
Challenges include slower time-to-first-token and potential wear on NAND Flash from frequent model updates, with debate over long-term cost savings versus DRAM.

Hasty Briefsbeta