The inner workings of TCP zero-copy
5 hours ago
- #TCP Optimization
- #Zero-Copy
- #Linux Kernel
- TCP zero-copy is a Linux kernel feature that allows sending and receiving data without extra copying between kernel and userspace memory.
- Send-side zero-copy was added in 2017 with the MSG_ZEROCOPY flag in sendmsg(), avoiding data copying by referencing userspace buffers directly.
- The kernel uses scatter-gather DMA for headers and data, requiring hardware support; otherwise, it falls back to copying.
- Applications must keep memory buffers unmodified until the kernel notifies via the socket error queue that they can be reused.
- io_uring supports zero-copy TCP since 2022, with io_uring_prep_send_zc() for asynchronous transfers and notifications.
- Receive-side zero-copy is more complex, requiring NIC support for TCP header split and memory binding via page_pool.
- Memory regions for zero-copy receive must be registered, and flow steering is needed to direct traffic to the correct buffers.
- Device memory (e.g., GPU or storage) can also be used for zero-copy, with DMA-buf file descriptors for registration.
- TX-side device memory zero-copy was added in 2025, with limited driver support and no io_uring integration yet.
- Zero-copy TCP can improve throughput by 30-40% in high-speed, bulk data transfers but is less beneficial for low-latency scenarios.