Glyph: Scaling Context Windows via Visual-Text Compression

6 months ago

Glyph is a framework for scaling context length through visual-text compression.
It renders long textual sequences into images and processes them using vision–language models (VLMs).
Glyph transforms long-context modeling into a multimodal problem, reducing computational and memory costs.
Provides competitive performance on LongBench and MRCR with significant compression and inference speedup.
Includes a demo script for comparing Glyph with baseline text models like Qwen3/GLM4.
Built on GLM-4.1V-9B-Base, with the fine-tuned model available on Hugging Face.
Supports vLLM acceleration for improved throughput and response speed in long-context scenarios.
Evaluation scripts provided for benchmarks including LONGBENCH, MRCR, and RULER.
Known limitations include sensitivity to rendering parameters and OCR-related challenges.
Citation provided for research use.

Hasty Briefsbeta