Hasty Briefsbeta

Bilingual

Show HN: I trained a 1B LLM from scratch for $315 and open-sourced weights+data

13 hours ago
  • #Language Model
  • #Open Source
  • #AI Safety
  • Tessera 1B is a ~1B-parameter open-source language model trained from scratch by AIIT-THRESHOLD on a hand-curated 24.5B-token corpus.
  • It serves as a clean, honest base model for fine-tuning, producing fluent English and some Japanese, but with limited reasoning and factual reliability out-of-the-box.
  • Key details include a custom decoder-only transformer architecture, 32 layers, 1536 d_model, 16 heads, 4096 context length, and training on web, books, and academic data for ~145.7 hours at a cost of ~$315.
  • Evaluation focuses on language-model loss (~3.20 nats), with no full standard-benchmark suite run; it includes optional LoRA adapters for demonstration and requires custom loading via provided scripts.
  • The model is licensed under Apache-2.0, with training data from per-source licensed content, and emphasizes transparency in data policy and limitations.