Hasty Briefsbeta

Eval-maxing an AI FFmpeg command generator

18 days ago
  • #Kiln
  • #Fine-Tuning
  • #AI Development
  • Creating an AI project from start to finish with Kiln.
  • Covered steps include creating evals, generating synthetic data, and validating with human ratings.
  • Evaluating prompt/model pairs to find the best way to run tasks.
  • Fine-tuning models with synthetic training data and evaluating results.
  • Iterating on the project with new evals and prompts as it evolves.
  • Setting up collaboration using Git and GitHub.
  • Demo project: natural language to ffmpeg command builder.
  • Key findings: GPT-4.1 outperformed other models, fine-tuning boosted performance by 21%.
  • Initial high eval scores were tempered by bugs, requiring iteration on product evals.
  • Process included creating correctness evals, generating synthetic data, and manual labeling.
  • Experimentation with prompts and models revealed GPT-4.1 dominance.
  • Fine-tuning involved various base models and providers, with promising results.
  • Iteration included fixing bugs, adding product goals, and setting up Git collaboration.
  • Next steps: improve evals, iterate on model+prompt, and consider more fine-tuning if needed.
  • Kiln is a free open tool for optimizing AI systems.