Lance – native supports image and video understanding, generation, and editing

2 hours ago

Lance is a lightweight unified multimodal model supporting image/video understanding, generation, and editing in a single framework.
It operates efficiently at 3 billion active parameters, trained from scratch with a multi-task recipe on a budget of 128 A100 GPUs.
The model demonstrates strong performance in benchmarks for image generation (e.g., GenEval, DPG), image editing (GEdit-Bench), and video generation (VBench).
Usage includes CLI for tasks like text-to-image/video generation, editing, and understanding, with examples provided in JSON configs.
Installation involves cloning the repository, setting up a Conda environment, installing dependencies, and downloading model weights from Hugging Face.
Evaluation results show Lance competes with or outperforms larger models in various benchmarks despite its compact size.

Hasty Briefsbeta