SSE sucks for transporting LLM tokens
7 days ago
- #SSE
- #LLM
- #PubSub
- SSE (Server-Sent Events) is criticized as a poor transport mechanism for LLM tokens due to its lack of reliability and inability to resume streams after disconnections.
- Key issues with SSE include the need to restart model inference from scratch if the connection drops, leading to poor user experience and increased costs.
- SSE is unidirectional, preventing mid-response steering or cancellation without ambiguity between accidental and intentional disconnects.
- WebSockets do not solve the core problem of resuming from disconnections, as they also require restarting model inference upon reconnection.
- A Pub/Sub model is suggested as a better alternative, allowing clients to resume token consumption without re-running inference, though it may introduce higher transport costs.
- The article highlights the trade-off between the cost of transport mechanisms and the quality of user experience, with SSE being cheap but unreliable.