I Open-Sourced My AI Toy Company That Runs on ESP32 and OpenAI Realtime API
a year ago
- #AI
- #ESP32
- #OpenAI
- Realtime AI Speech powered by OpenAI Realtime API, ESP32, Secure WebSockets, and Deno Edge Functions for >10-minute uninterrupted global conversations.
- Install Supabase CLI and set up your Local Supabase Backend.
- Set up your NextJS Frontend with environment variables and run the development server.
- Add your ESP32-S3 Device MAC Address to the Settings page in the NextJS Frontend.
- Add your OpenAI API Key in the server-deno/.env and frontend-nextjs/.env.local file.
- Start the Deno server with environment variables.
- Set up your ESP32 Arduino Client and configure the device wifi.
- ElatoAI consists of three main components: Frontend Client, Edge Server Functions, and ESP32 IoT Client.
- Features include Realtime Speech-to-Speech, Custom AI Agents, Customizable Voices, Secure WebSockets, Server VAD Turn Detection, Opus Audio Compression, Global Edge Performance, ESP32 Arduino Framework, Conversation History, Device Management, User Authentication, Conversations with WebRTC and Websockets, Volume Control, and Realtime Transcripts.
- Technologies used: Frontend (Next.js, Vercel), Backend (Supabase DB), Edge Functions (Deno Edge Functions on Deno/Supabase), IoT Client (PlatformIO, Arduino Framework, ESP32-S3), Audio Codec (Opus), Communication (Secure WebSockets), Libraries (ArduinoJson, WebSockets, AsyncWebServer, ESP32_Button, Arduino Audio Tools, ArduinoLibOpus).
- Performance metrics: Latency (<1s round-trip globally), Audio Quality (Opus codec at 24kbps), Uninterrupted Conversations (Up to 10 minutes), Global Availability (Optimized with edge computing with Deno).
- Security features: Secure WebSockets (WSS) for encrypted data transfers, Optional API Key encryption with 256-bit AES, Supabase DB for secure authentication, Supabase RLS for all tables.
- Limitations: 3-4s Cold start time while connecting to edge server, Limited to upto 10 minutes of uninterrupted conversations, Edge server stops when wall clock time is exceeded, No speech interruption detection on ESP32.
- Future improvements: Looking for Speech Interruption detection on ESP32, Adding Arduino IDE support, Adding tool calling support on Deno Edge.
- Contribution guidelines: Fork the repository, Create your feature branch, Commit your changes, Push to the branch, Open a PR.
- Project is licensed under the MIT License.