Hasty Briefsbeta

Bilingual

LLM Benchmark for 'Longform Creative Writing'

a year ago
  • #LLM
  • #Creative Writing
  • #Benchmark
  • EQ-Bench3 is an LLM-judged longform creative writing benchmark (v3).
  • Models are evaluated using openrouter with temp=0.7 and min_p=0.1 generation settings.
  • Outputs are scored by Claude Sonnet 3.7 based on a rubric.
  • Average chapter length is measured in characters.
  • Slop column tracks overused 'GPT-isms'—lower is better.
  • Repetition column measures word/phrase repetition across tasks—higher means more repetition.
  • Degradation score shows quality drop-off via a trendline gradient.
  • Final rating is scaled 0–100 (higher is better).