Hasty Briefsbeta

Bilingual

Pico-Banana-400k

6 months ago
  • #image-editing
  • #dataset
  • #multimodal
  • Pico-Banana-400K is a large-scale dataset of ~400K text–image–edit triplets for text-guided image editing research.
  • Each example includes an original image from Open Images, a human-like edit instruction, and an edited result verified by Nano-Banana.
  • The dataset spans 35 edit operations across 8 semantic categories, covering diverse transformations.
  • Features include ~257K single-turn triplets for SFT, ~56K for preference learning, and ~72K multi-turn samples.
  • Edit categories include Object-Level, Scene Composition, Human-Centric, Stylistic, Text & Symbol, Pixel & Photometric, Scale & Perspective, and Spatial/Layout.
  • The dataset is built using a two-stage pipeline: instruction generation via Gemini-2.5-Flash and editing/self-evaluation via Nano-Banana.
  • Edits are quality-controlled using automated judging with metrics like Instruction Compliance and Editing Realism.
  • Pico-Banana-400K is hosted on Apple’s public CDN and available under the CC BY-NC-ND 4.0 license for non-commercial use.
  • Source images follow the Open Images (CC BY 2.0) license, and manifest files are provided for downloading components.
  • Citation details are provided for research use.