Hasty Briefsbeta

Optical Context Compression Is Just (Bad) Autoencoding

3 days ago
  • #Context Compression
  • #Computer Vision
  • #Language Models
  • DeepSeek-OCR shows high-fidelity text reconstruction from few vision tokens, sparking interest in vision-based context compression for language models.
  • The study questions two assumptions: vision-based compression's unique advantages for text reconstruction and its usefulness for language modeling.
  • Simple alternatives like parameter-free mean pooling and a learned hierarchical encoder match or surpass vision-based methods in reconstruction and outperform them in language modeling.
  • Vision-based compression fails to outperform truncation in language modeling, suggesting current excitement may be premature.
  • Code and checkpoints are available for further exploration.