LLM Benchmark for 'Longform Creative Writing'
a year ago
- #LLM
- #Creative Writing
- #Benchmark
- EQ-Bench3 is an LLM-judged longform creative writing benchmark (v3).
- Models are evaluated using openrouter with temp=0.7 and min_p=0.1 generation settings.
- Outputs are scored by Claude Sonnet 3.7 based on a rubric.
- Average chapter length is measured in characters.
- Slop column tracks overused 'GPT-isms'—lower is better.
- Repetition column measures word/phrase repetition across tasks—higher means more repetition.
- Degradation score shows quality drop-off via a trendline gradient.
- Final rating is scaled 0–100 (higher is better).