LightlyStudio – an open-source multimodal data curation and labeling tool
a day ago
- #data-annotation
- #machine-learning
- #open-source
- LightlyStudio is an open-source tool for data curation, annotation, and management.
- Built with Rust for performance, it supports COCO and ImageNet datasets on a Macbook Pro with M1 and 16GB RAM.
- Compatible with Python 3.8+ on Windows, Linux, and MacOS.
- Install via pip: `pip install lightly-studio`.
- Example datasets can be downloaded from a GitHub repository or use your own YOLO/COCO dataset.
- Includes examples for image-only datasets, YOLO object detection, COCO instance segmentation, and COCO captions.
- LightlyStudio features a powerful Python interface for dataset indexing, querying, and manipulation.
- Supports loading data from cloud storage (e.g., S3, GCS) and local folders.
- Sample attributes include ID, file name, path, tags, and metadata, which can be accessed and modified.
- Dataset queries allow filtering, sorting, and slicing operations using expressions.
- Premium feature for automated data selection to pick the most useful samples based on typicality and diversity.
- Version 0.4.0 released as a preview on 2025-10-21.
- Contributions are welcome via the issues page for tasks and improvements.