Learning to Replicate Expert Judgment in Financial Tasks
9 hours ago
- #AI Training
- #LLM Fine-tuning
- #Financial Judgment
- Outperforming the market is challenging due to the need for unique insight from investor judgment, which is difficult to articulate or teach directly.
- LLMs struggle with simple financial tasks like filtering and processing documents, even though these are routine for investors.
- The post explores automating information triage using LLMs, showing that with expert annotations, proprietary models can achieve expert-level judgment.
- Frontier models (e.g., Gemini, Claude, GPT) underperform on six filtering tasks, with accuracy around 50-80%, below the 80% threshold for trust.
- Improved prompting boosted accuracy to the mid-70s, but fine-tuning with high-quality human-labeled data was necessary for further gains.
- A custom training dataset was built using expert verification to correct non-expert labels, enhancing data quality.
- The training recipe used Qwen3-235B as a base model, with techniques like interleaved batching, CISPO loss with asymmetric clipping, and on-policy distillation.
- The final proprietary model achieved 84.7% accuracy, making 29.8% fewer mistakes than frontier models, with a 13.8x reduction in inference costs.
- The conclusion highlights that custom models tuned to organizational needs outperform frontier models in accuracy and cost, enabling differentiated intelligence.