Generating Images with a 2025 Android
2 days ago
- #Machine Learning Porting
- #Android Image Generation
- #NPU Optimization
- Successfully generated images on a Samsung Galaxy S25+ using PrismML's Bonsai Image model on the phone's Hexagon NPU.
- The Android port was more difficult than iOS due to immature machine-learning stacks, lack of mature equivalents like Core ML/MLX, and hardware variability requiring optimization for CPU, GPU, or NPU.
- CPU baseline using stable-diffusion.cpp fork was slow (8-9 minutes per 512×512 image), prompting exploration of GPU (limited to 256×256) and NPU to improve performance.
- NPU implementation overcame challenges like weight expansion, fp16 overflows, and SDK quirks, achieving 512×512 images in ~2 minutes (20s prompt encoding, 65s NPU denoising, 45s decoding).
- NPU-generated images were softer than iPhone's due to mixed precision (fp16 and lower-precision integers), affecting fine detail, and the deployable bundle was larger (10.7 GB vs. 3.7 GB).
- The project did not result in a fully tappable Android app due to app-NPU communication issues, but it provides a useful starting point for future work, with potential optimizations for text encoder and VAE.