Show HN: I trained a 1B LLM from scratch for $315 and open-sourced weights+data
11 hours ago
- #Language Model
- #Open Source
- #AI Safety
- Tessera 1B is a ~1B-parameter open-source language model trained from scratch by AIIT-THRESHOLD on a hand-curated 24.5B-token corpus.
- It serves as a clean, honest base model for fine-tuning, producing fluent English and some Japanese, but with limited reasoning and factual reliability out-of-the-box.
- Key details include a custom decoder-only transformer architecture, 32 layers, 1536 d_model, 16 heads, 4096 context length, and training on web, books, and academic data for ~145.7 hours at a cost of ~$315.
- Evaluation focuses on language-model loss (~3.20 nats), with no full standard-benchmark suite run; it includes optional LoRA adapters for demonstration and requires custom loading via provided scripts.
- The model is licensed under Apache-2.0, with training data from per-source licensed content, and emphasizes transparency in data policy and limitations.