Pico-Banana-400k

6 months ago

Pico-Banana-400K is a large-scale dataset of ~400K text–image–edit triplets for text-guided image editing research.
Each example includes an original image from Open Images, a human-like edit instruction, and an edited result verified by Nano-Banana.
The dataset spans 35 edit operations across 8 semantic categories, covering diverse transformations.
Features include ~257K single-turn triplets for SFT, ~56K for preference learning, and ~72K multi-turn samples.
Edit categories include Object-Level, Scene Composition, Human-Centric, Stylistic, Text & Symbol, Pixel & Photometric, Scale & Perspective, and Spatial/Layout.
The dataset is built using a two-stage pipeline: instruction generation via Gemini-2.5-Flash and editing/self-evaluation via Nano-Banana.
Edits are quality-controlled using automated judging with metrics like Instruction Compliance and Editing Realism.
Pico-Banana-400K is hosted on Apple’s public CDN and available under the CC BY-NC-ND 4.0 license for non-commercial use.
Source images follow the Open Images (CC BY 2.0) license, and manifest files are provided for downloading components.
Citation details are provided for research use.

Hasty Briefsbeta