Hasty Briefsbeta

Bilingual

Sarvam 105B, the first competitive Indian open source LLM

6 hours ago
  • #AI
  • #OpenSource
  • #IndiaAI
  • Sarvam releases two open-source models: Sarvam 30B and Sarvam 105B, trained from scratch in India.
  • Both models are optimized for reasoning, programming, and agentic tasks, with strong performance on Indian language benchmarks.
  • Sarvam 30B is designed for real-time deployment, powering the conversational agent platform Samvaad.
  • Sarvam 105B excels in complex reasoning and agentic workflows, powering the AI assistant Indus.
  • The models use Mixture-of-Experts (MoE) Transformer architecture for efficient training and deployment.
  • Pre-training involved large datasets (16T tokens for 30B, 12T tokens for 105B) with a focus on reasoning and multilingual content.
  • Supervised fine-tuning included high-quality prompts and safety fine-tuning for India-specific risk scenarios.
  • Reinforcement learning used diverse prompts and adaptive sampling for effective learning.
  • Benchmarks show Sarvam 105B outperforming comparable models in knowledge, reasoning, and agentic tasks.
  • Sarvam 30B performs well on coding and reasoning benchmarks, optimized for efficient deployment.
  • Tokenizer efficiency is optimized for Indian languages, reducing cost and latency.
  • Inference optimizations include kernel-level rewrites and advanced scheduling for high throughput.
  • Demos showcase practical applications, including webpage generation, tutoring, and competitive programming.
  • The models are available via API and can be downloaded from AI Kosh and Hugging Face.
  • Conclusion highlights the models' role in building sovereign AI infrastructure in India.