Open-source framework for real-time AI voice
10 months ago
- #AI Agents
- #Python SDK
- #Real-time Communication
- The AI Agent SDK is a Python framework for developing real-time multimodal conversational AI agents.
- It enables AI-powered agents to join VideoSDK rooms as participants, facilitating seamless voice and media interactions.
- Key features include real-time communication (audio/video), SIP & telephony integration, virtual avatars, multi-model support, and conversational flow management.
- The SDK supports integration with various AI models and tools like OpenAI, Gemini, AWS NovaSonic, and more.
- Setup requires a VideoSDK authentication token, meeting ID, Python 3.12+, and third-party API keys for services like OpenAI or Google.
- Developers can create custom voice agents by inheriting from the base Agent class and defining function tools for extended capabilities.
- The framework supports both external and internal function tools, decorated with @function_tool.
- Example use cases include hospital appointment booking, flight and hotel inquiries, weather queries with avatars, and e-commerce scenarios.
- Contributions are welcome, including feedback, bug fixes, new features, plugins, and documentation improvements.
- A guide is available for creating custom STT, LLM, or TTS plugins to expand the framework's capabilities.