Hunting a 34 year old pointer bug in EtherSlip (DOS Networking)
2 days ago
- #DOS Networking
- #EtherSLIP Bug
- #Memory Corruption
- The author revisited SLIP connections between DOS and Linux, noting that EtherSLIP emulates Ethernet for compatibility with DOS packet drivers.
- During testing, Telnet exhibited slow performance and packet drops due to a hardware issue, and upon exit, displayed a 'NULL assignment detected' error from Open Watcom compiler runtime.
- The error indicated corruption in the first 32 bytes of the data segment, leading the author to add trace points and create a nullCheck function to monitor memory during runtime.
- Debugging revealed the corruption occurred after a packet driver interrupt call, with six bytes matching the MAC address of the simulated Ethernet device from EtherSLIP.
- Analysis of EtherSLIP's assembly code uncovered a bug in ARP response simulation where segment registers were incorrectly set, causing data to be written to the wrong memory location.
- The bug involved copying MAC addresses in ARP handling, where DS was erroneously copied into ES, leading to writes near the data segment start instead of the intended buffer.
- Factors like infrequent ARP requests, compiler optimizations, and memory layout variations allowed the bug to remain hidden for years.
- The author fixed the bug by removing the incorrect segment copy and suggested improvements to mTCP to avoid ARP requests on SLIP connections.
- The incident underscores the importance of heeding compiler warnings and thoroughly debugging to prevent obscure issues in legacy systems.