I unified convolution and attention into a single framework

8 hours ago

Copy Link

Introduces Generalized Windowed Operation (GWO) to unify deep learning operations like matrix multiplication and convolution.
Decomposes operations into three components: Path (operational locality), Shape (geometric structure), and Weight (feature importance).
Proposes the Principle of Structural Alignment for optimal generalization when GWO configuration mirrors data's intrinsic structure.
Links the Principle of Structural Alignment to the Information Bottleneck (IB) principle.
Defines an Operational Complexity metric based on Kolmogorov complexity, emphasizing the quality of complexity over quantity.
Argues that adaptive alignment with data structure leads to superior generalization bounds.
Shows that canonical operations and their variants are optimal solutions to the IB objective.
Provides a grammar for creating neural operations and a pathway from data properties to architecture design.

Hasty Briefsbeta