Hasty Briefsbeta

I unified convolution and attention into a single framework

8 hours ago
  • #deep learning
  • #generalization
  • #neural operations
  • Introduces Generalized Windowed Operation (GWO) to unify deep learning operations like matrix multiplication and convolution.
  • Decomposes operations into three components: Path (operational locality), Shape (geometric structure), and Weight (feature importance).
  • Proposes the Principle of Structural Alignment for optimal generalization when GWO configuration mirrors data's intrinsic structure.
  • Links the Principle of Structural Alignment to the Information Bottleneck (IB) principle.
  • Defines an Operational Complexity metric based on Kolmogorov complexity, emphasizing the quality of complexity over quantity.
  • Argues that adaptive alignment with data structure leads to superior generalization bounds.
  • Shows that canonical operations and their variants are optimal solutions to the IB objective.
  • Provides a grammar for creating neural operations and a pathway from data properties to architecture design.