Hasty Briefsbeta

Bilingual

Mr. Chatterbox is a (weak) Victorian-era ethically trained model

13 hours ago
  • #victorian-era
  • #open-source
  • #language-model
  • Mr. Chatterbox is a Victorian-era language model trained exclusively on 28,035 out-of-copyright British texts from 1837 to 1899.
  • The model has about 340 million parameters, similar to GPT-2-Medium, but uses only 2.93 billion training tokens from historical data.
  • It is small (2.05GB) and can run locally via a HuggingFace demo or an LLM plugin called llm-mrchatterbox.
  • Despite a charming Victorian style, its conversational ability is limited, resembling a Markov chain rather than a modern LLM.
  • Training data may need to be quadrupled to improve usefulness, but the project is a promising start for public domain models.