Hasty Briefsbeta

Bilingual

LLMs are bad at returning code in JSON

9 months ago
  • #LLM
  • #JSON
  • #Code Quality
  • LLMs produce lower quality code when returning it as part of a structured JSON response.
  • Benchmarks show models struggle with syntax errors in JSON-wrapped code, especially with quoting and escaping.
  • Plain text (markdown) outperforms JSON in code quality and problem-solving capacity.
  • OpenAI's 'strict' JSON mode offers no improvement over non-strict JSON for code quality.
  • Models like Claude-3-5-Sonnet and DeepSeek Coder suffer the most from JSON-wrapping.
  • JSON-wrapping may distract models, reducing their ability to reason about coding problems.
  • OpenAI's GPT-4o shows the least performance drop when using JSON, but plain text remains superior.