Context Is Software, Weights Are Hardware

3 days ago

#LLMs
#transformer-architecture
#continual-learning

Increasing context window length and improving KV cache compression represent a popular approach to continual learning in LLMs, focusing on in-context learning rather than weight updates.
Context (via the KV cache) and weights both shape activations in transformers, serving similar functions: in-context learning causes temporary shifts, while fine-tuning leads to permanent changes in internal representations.
Weights act like hardware, defining computational capabilities, while context functions as software running on that hardware; weight modification adds new "instructions," enabling computations beyond the pretrained model's original scope.
Long context is effective for tasks within the pretraining distribution but hits a ceiling when requiring representations not covered, such as domain-specific knowledge or unique patterns, where weight updates excel.
Weight modification offers advantages in inference cost (O(1) vs. O(n) for context), compression (small adapters vs. large token sequences), and composability (cumulative updates vs. single-step approximations).
The brain's memory systems (hippocampus for fast, temporary memory and neocortex for slow, persistent storage) provide a biological analogy, suggesting complementary roles for context and weight-based learning.
Future development should integrate both methods: longer context for working memory and weight-space learning for accumulating persistent, generalizable knowledge, as neither alone is sufficient for comprehensive continual learning.

Hasty Briefsbeta

Context Is Software, Weights Are Hardware