SpikingBrain：Spiking Brain-Inspired Large Models

18 hours ago

Copy Link

SpikingBrain integrates hybrid efficient attention, MoE modules, and spike encoding, achieving performance comparable to mainstream models with less than 2% of data.
Adapted for non-NVIDIA (MetaX) clusters, ensuring stable large-scale training and inference with over 100× speedup in TTFT for 4M-token sequences.
Provides spiking with over 69% micro-level sparsity and macro-level MoE sparsity, guiding next-gen neuromorphic chip design.
Includes full implementation and weights of SpikingBrain-7B, with HuggingFace, vLLM, and quantized versions for flexible deployment.
vLLM-hymeta plugin offers modular backend integration, decoupled codebase, reduced maintenance, and faster backend integration.
W8ASpike quantized version reduces inference cost with pseudo-spiking, suitable for prototyping and research.
Model weights available on ModelScope for pre-trained, chat, and quantized versions.
Example scripts provided for running models with HuggingFace and vLLM frameworks.
Performance evaluations show advantages over baselines on CMMLU and C-Eval benchmarks.

Hasty Briefsbeta