Hasty Briefsbeta

Bilingual

Domain Adaptation of Base Models + ShadowdarkQA Bench

a year ago
  • #Shadowdark
  • #LLM
  • #TTRPG
  • Two approaches to developing an autonomous LLM Game Master: fast (agentic) vs. slow (hands-on experience).
  • Goal is understanding model capabilities and gaining hands-on experience, not just the end product.
  • Starting with base models to bake in TTRPG-specific priors for better rule understanding.
  • Compute constraints lead to choosing smaller models like Qwen3 series (0.6B to 14B).
  • Shadowdark RPG chosen over DND for simplicity, lack of prior knowledge, and ease of verification.
  • OCR used to extract Shadowdark rules into clean markdown format for training data.
  • Created Shadowdark QA Bench with categories like spell_mechanics, player_characters, monsters, etc.
  • Evaluation metrics: keyword-based matching for precise grading of rule recall.
  • Initial training on sourcebooks improved performance but struggled with numerical recall.
  • Knowledge augmentation (10x restatements) boosted performance to 66.6% on QA benchmark.
  • Next steps: assistant tuning and further improvements to reach 70% accuracy.