Domain Adaptation of Base Models + ShadowdarkQA Bench
a year ago
- #Shadowdark
- #LLM
- #TTRPG
- Two approaches to developing an autonomous LLM Game Master: fast (agentic) vs. slow (hands-on experience).
- Goal is understanding model capabilities and gaining hands-on experience, not just the end product.
- Starting with base models to bake in TTRPG-specific priors for better rule understanding.
- Compute constraints lead to choosing smaller models like Qwen3 series (0.6B to 14B).
- Shadowdark RPG chosen over DND for simplicity, lack of prior knowledge, and ease of verification.
- OCR used to extract Shadowdark rules into clean markdown format for training data.
- Created Shadowdark QA Bench with categories like spell_mechanics, player_characters, monsters, etc.
- Evaluation metrics: keyword-based matching for precise grading of rule recall.
- Initial training on sourcebooks improved performance but struggled with numerical recall.
- Knowledge augmentation (10x restatements) boosted performance to 66.6% on QA benchmark.
- Next steps: assistant tuning and further improvements to reach 70% accuracy.