Hasty Briefsbeta

SpikingBrain:Spiking Brain-Inspired Large Models

18 hours ago
  • #Neuromorphic Computing
  • #AI Models
  • #SpikingBrain
  • SpikingBrain integrates hybrid efficient attention, MoE modules, and spike encoding, achieving performance comparable to mainstream models with less than 2% of data.
  • Adapted for non-NVIDIA (MetaX) clusters, ensuring stable large-scale training and inference with over 100× speedup in TTFT for 4M-token sequences.
  • Provides spiking with over 69% micro-level sparsity and macro-level MoE sparsity, guiding next-gen neuromorphic chip design.
  • Includes full implementation and weights of SpikingBrain-7B, with HuggingFace, vLLM, and quantized versions for flexible deployment.
  • vLLM-hymeta plugin offers modular backend integration, decoupled codebase, reduced maintenance, and faster backend integration.
  • W8ASpike quantized version reduces inference cost with pseudo-spiking, suitable for prototyping and research.
  • Model weights available on ModelScope for pre-trained, chat, and quantized versions.
  • Example scripts provided for running models with HuggingFace and vLLM frameworks.
  • Performance evaluations show advantages over baselines on CMMLU and C-Eval benchmarks.