Hasty Briefsbeta

Bilingual

A Year of Fast Apply – Our Path to 10k Tokens per Second

6 months ago
  • #AI
  • #Machine Learning
  • #Software Development
  • Released Fast Apply model a year ago, focusing on fine-tuning small, specialized models for code-specific tasks.
  • Open-sourced training insights leading to Relace Apply 3, capable of 10k+ tokens per second with state-of-the-art accuracy.
  • Highlighted the inefficiency of regenerating unchanged code with expensive LLMs, proposing a lightweight diff application solution.
  • Introduced the concept of using an LLM as a merge algorithm to handle pathological diffs and infer intent, improving accuracy.
  • Detailed dataset production for training, emphasizing quality and diversity over size, with a focus on real production data.
  • Explained the evaluation process for merges, categorizing outcomes into six types to ensure high-quality training data.
  • Utilized LLM-as-a-judge to scale up dataset filtering, achieving a low false positive rate for reliable training examples.
  • Adopted LoRA for efficient model training, allowing specialization without catastrophic forgetting of general coding knowledge.
  • Achieved 10k tok/s with speculative decoding, leveraging strong priors in code merging for parallel token processing.
  • Showcased Relace Apply 3's improvements in merge accuracy, context length, and speed, positioning it as a market leader.
  • Reflected on Fast Apply's impact over the past year, highlighting its role in making structured code edits reliable.
  • Announced hiring for researchers and engineers to continue developing specialized models for coding tasks.