Chomsky and the Two Cultures of Statistical Learning

4 days ago

https://norvig.com/chomsky.html

Copy Link

#Statistical Models
#Chomsky
#Computational Linguistics

Chomsky criticizes statistical models in machine learning for mimicking behavior without understanding meaning.
Statistical models like Markov chains require vast amounts of data and parameters, exemplified by Claude Shannon's work.
Newton's gravitational model is a trained, deterministic model, contrasting with probabilistic models like Shannon's.
The ideal gas law is a probabilistic model that provides predictions and insight despite not modeling individual molecules.
Statistical models outperform deterministic rules in accuracy, as shown by the 'I before E' spelling example.
Probabilistic models are dominant in computational linguistics due to their performance and adaptability.
Chomsky's objections to probabilistic models include their inability to handle novel sentences and lack of insight.
Probabilistic models like PCFGs are state-of-the-art in parsing and better represent linguistic facts than categorical models.
Chomsky prefers simple, elegant models focusing on competence over performance, aligning with a Platonic view of language.
Breiman's 'Two Cultures' contrasts data modeling (simple, interpretable) with algorithmic modeling (complex, accurate but less interpretable).
Chomsky's pro-drop parameter theory is critiqued for oversimplifying the messy reality of language use.
Probabilistic models are essential for interpretation tasks like speech recognition, handling ambiguity and noise.
Chomsky's focus on generative language processes contrasts with the probabilistic needs of interpretation.
Language is seen as a contingent, evolving system best analyzed with probabilistic models, contrary to Chomsky's ideal forms.
Gold's Theorem and Horning's results are discussed in the context of language learning and innate capabilities.

Hasty Briefsbeta

Chomsky and the Two Cultures of Statistical Learning