Beyond Attention: Toward Machines with Intrinsic Higher Mental States

a year ago

The paper explores how machine learning models can emulate high-level perceptual processing and awake thought states to pre-select relevant information before applying attention.
Inspired by neurobiological evidence, it introduces triadic neuronal-level modulation loops among questions (Q), clues (K), and hypotheses (V) to enable deep, parallel reasoning chains.
This approach leads to faster learning with reduced computational demand, achieving an approximate cost of O(N), where N is the number of input tokens.
Results demonstrate effectiveness in reinforcement learning, computer vision, and natural language question answering.
The method allows models to shift rapidly from initial biases to refined understanding, improving efficiency and performance.

Hasty Briefsbeta