I unified convolution and attention into a single framework
8 hours ago
- #deep learning
- #generalization
- #neural operations
- Introduces Generalized Windowed Operation (GWO) to unify deep learning operations like matrix multiplication and convolution.
- Decomposes operations into three components: Path (operational locality), Shape (geometric structure), and Weight (feature importance).
- Proposes the Principle of Structural Alignment for optimal generalization when GWO configuration mirrors data's intrinsic structure.
- Links the Principle of Structural Alignment to the Information Bottleneck (IB) principle.
- Defines an Operational Complexity metric based on Kolmogorov complexity, emphasizing the quality of complexity over quantity.
- Argues that adaptive alignment with data structure leads to superior generalization bounds.
- Shows that canonical operations and their variants are optimal solutions to the IB objective.
- Provides a grammar for creating neural operations and a pathway from data properties to architecture design.