Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
10 months ago
- #natural language processing
- #end-to-end models
- #machine learning
- Introduces dynamic chunking mechanism for end-to-end hierarchical sequence modeling.
- Replaces tokenization-LM-detokenization pipeline with a single H-Net model.
- H-Net outperforms Transformer models at byte level and scales better with data.
- Shows increased character-level robustness and learns meaningful chunking strategies without supervision.
- Demonstrates significant improvements in languages and modalities with weaker tokenization heuristics.