Pico-Banana-400k
6 months ago
- #image-editing
- #dataset
- #multimodal
- Pico-Banana-400K is a large-scale dataset of ~400K text–image–edit triplets for text-guided image editing research.
- Each example includes an original image from Open Images, a human-like edit instruction, and an edited result verified by Nano-Banana.
- The dataset spans 35 edit operations across 8 semantic categories, covering diverse transformations.
- Features include ~257K single-turn triplets for SFT, ~56K for preference learning, and ~72K multi-turn samples.
- Edit categories include Object-Level, Scene Composition, Human-Centric, Stylistic, Text & Symbol, Pixel & Photometric, Scale & Perspective, and Spatial/Layout.
- The dataset is built using a two-stage pipeline: instruction generation via Gemini-2.5-Flash and editing/self-evaluation via Nano-Banana.
- Edits are quality-controlled using automated judging with metrics like Instruction Compliance and Editing Realism.
- Pico-Banana-400K is hosted on Apple’s public CDN and available under the CC BY-NC-ND 4.0 license for non-commercial use.
- Source images follow the Open Images (CC BY 2.0) license, and manifest files are provided for downloading components.
- Citation details are provided for research use.