Chomsky and the Two Cultures of Statistical Learning
4 days ago
- #Statistical Models
- #Chomsky
- #Computational Linguistics
- Chomsky criticizes statistical models in machine learning for mimicking behavior without understanding meaning.
- Statistical models like Markov chains require vast amounts of data and parameters, exemplified by Claude Shannon's work.
- Newton's gravitational model is a trained, deterministic model, contrasting with probabilistic models like Shannon's.
- The ideal gas law is a probabilistic model that provides predictions and insight despite not modeling individual molecules.
- Statistical models outperform deterministic rules in accuracy, as shown by the 'I before E' spelling example.
- Probabilistic models are dominant in computational linguistics due to their performance and adaptability.
- Chomsky's objections to probabilistic models include their inability to handle novel sentences and lack of insight.
- Probabilistic models like PCFGs are state-of-the-art in parsing and better represent linguistic facts than categorical models.
- Chomsky prefers simple, elegant models focusing on competence over performance, aligning with a Platonic view of language.
- Breiman's 'Two Cultures' contrasts data modeling (simple, interpretable) with algorithmic modeling (complex, accurate but less interpretable).
- Chomsky's pro-drop parameter theory is critiqued for oversimplifying the messy reality of language use.
- Probabilistic models are essential for interpretation tasks like speech recognition, handling ambiguity and noise.
- Chomsky's focus on generative language processes contrasts with the probabilistic needs of interpretation.
- Language is seen as a contingent, evolving system best analyzed with probabilistic models, contrary to Chomsky's ideal forms.
- Gold's Theorem and Horning's results are discussed in the context of language learning and innate capabilities.