LLMs are bad at returning code in JSON
9 months ago
- #LLM
- #JSON
- #Code Quality
- LLMs produce lower quality code when returning it as part of a structured JSON response.
- Benchmarks show models struggle with syntax errors in JSON-wrapped code, especially with quoting and escaping.
- Plain text (markdown) outperforms JSON in code quality and problem-solving capacity.
- OpenAI's 'strict' JSON mode offers no improvement over non-strict JSON for code quality.
- Models like Claude-3-5-Sonnet and DeepSeek Coder suffer the most from JSON-wrapping.
- JSON-wrapping may distract models, reducing their ability to reason about coding problems.
- OpenAI's GPT-4o shows the least performance drop when using JSON, but plain text remains superior.