Hasty Briefsbeta

Bilingual

Glyph: Scaling Context Windows via Visual-Text Compression

6 months ago
  • #long-context
  • #compression
  • #multimodal
  • Glyph is a framework for scaling context length through visual-text compression.
  • It renders long textual sequences into images and processes them using vision–language models (VLMs).
  • Glyph transforms long-context modeling into a multimodal problem, reducing computational and memory costs.
  • Provides competitive performance on LongBench and MRCR with significant compression and inference speedup.
  • Includes a demo script for comparing Glyph with baseline text models like Qwen3/GLM4.
  • Built on GLM-4.1V-9B-Base, with the fine-tuned model available on Hugging Face.
  • Supports vLLM acceleration for improved throughput and response speed in long-context scenarios.
  • Evaluation scripts provided for benchmarks including LONGBENCH, MRCR, and RULER.
  • Known limitations include sensitivity to rendering parameters and OCR-related challenges.
  • Citation provided for research use.