Mr. Chatterbox is a (weak) Victorian-era ethically trained model
11 hours ago
- #victorian-era
- #open-source
- #language-model
- Mr. Chatterbox is a Victorian-era language model trained exclusively on 28,035 out-of-copyright British texts from 1837 to 1899.
- The model has about 340 million parameters, similar to GPT-2-Medium, but uses only 2.93 billion training tokens from historical data.
- It is small (2.05GB) and can run locally via a HuggingFace demo or an LLM plugin called llm-mrchatterbox.
- Despite a charming Victorian style, its conversational ability is limited, resembling a Markov chain rather than a modern LLM.
- Training data may need to be quadrupled to improve usefulness, but the project is a promising start for public domain models.