Sarvam 105B, the first competitive Indian open source LLM
4 hours ago
- #AI
- #OpenSource
- #IndiaAI
- Sarvam releases two open-source models: Sarvam 30B and Sarvam 105B, trained from scratch in India.
- Both models are optimized for reasoning, programming, and agentic tasks, with strong performance on Indian language benchmarks.
- Sarvam 30B is designed for real-time deployment, powering the conversational agent platform Samvaad.
- Sarvam 105B excels in complex reasoning and agentic workflows, powering the AI assistant Indus.
- The models use Mixture-of-Experts (MoE) Transformer architecture for efficient training and deployment.
- Pre-training involved large datasets (16T tokens for 30B, 12T tokens for 105B) with a focus on reasoning and multilingual content.
- Supervised fine-tuning included high-quality prompts and safety fine-tuning for India-specific risk scenarios.
- Reinforcement learning used diverse prompts and adaptive sampling for effective learning.
- Benchmarks show Sarvam 105B outperforming comparable models in knowledge, reasoning, and agentic tasks.
- Sarvam 30B performs well on coding and reasoning benchmarks, optimized for efficient deployment.
- Tokenizer efficiency is optimized for Indian languages, reducing cost and latency.
- Inference optimizations include kernel-level rewrites and advanced scheduling for high throughput.
- Demos showcase practical applications, including webpage generation, tutoring, and competitive programming.
- The models are available via API and can be downloaded from AI Kosh and Hugging Face.
- Conclusion highlights the models' role in building sovereign AI infrastructure in India.