Domain Adaptation of Base Models + ShadowdarkQA Bench

a year ago

Two approaches to developing an autonomous LLM Game Master: fast (agentic) vs. slow (hands-on experience).
Goal is understanding model capabilities and gaining hands-on experience, not just the end product.
Starting with base models to bake in TTRPG-specific priors for better rule understanding.
Compute constraints lead to choosing smaller models like Qwen3 series (0.6B to 14B).
Shadowdark RPG chosen over DND for simplicity, lack of prior knowledge, and ease of verification.
OCR used to extract Shadowdark rules into clean markdown format for training data.
Created Shadowdark QA Bench with categories like spell_mechanics, player_characters, monsters, etc.
Evaluation metrics: keyword-based matching for precise grading of rule recall.
Initial training on sourcebooks improved performance but struggled with numerical recall.
Knowledge augmentation (10x restatements) boosted performance to 66.6% on QA benchmark.
Next steps: assistant tuning and further improvements to reach 70% accuracy.

Hasty Briefsbeta