SpikingBrain:Spiking Brain-Inspired Large Models
18 hours ago
- #Neuromorphic Computing
- #AI Models
- #SpikingBrain
- SpikingBrain integrates hybrid efficient attention, MoE modules, and spike encoding, achieving performance comparable to mainstream models with less than 2% of data.
- Adapted for non-NVIDIA (MetaX) clusters, ensuring stable large-scale training and inference with over 100× speedup in TTFT for 4M-token sequences.
- Provides spiking with over 69% micro-level sparsity and macro-level MoE sparsity, guiding next-gen neuromorphic chip design.
- Includes full implementation and weights of SpikingBrain-7B, with HuggingFace, vLLM, and quantized versions for flexible deployment.
- vLLM-hymeta plugin offers modular backend integration, decoupled codebase, reduced maintenance, and faster backend integration.
- W8ASpike quantized version reduces inference cost with pseudo-spiking, suitable for prototyping and research.
- Model weights available on ModelScope for pre-trained, chat, and quantized versions.
- Example scripts provided for running models with HuggingFace and vLLM frameworks.
- Performance evaluations show advantages over baselines on CMMLU and C-Eval benchmarks.