Lance – native supports image and video understanding, generation, and editing
2 hours ago
- #lightweight-model
- #multimodal-ai
- #video-generation
- Lance is a lightweight unified multimodal model supporting image/video understanding, generation, and editing in a single framework.
- It operates efficiently at 3 billion active parameters, trained from scratch with a multi-task recipe on a budget of 128 A100 GPUs.
- The model demonstrates strong performance in benchmarks for image generation (e.g., GenEval, DPG), image editing (GEdit-Bench), and video generation (VBench).
- Usage includes CLI for tasks like text-to-image/video generation, editing, and understanding, with examples provided in JSON configs.
- Installation involves cloning the repository, setting up a Conda environment, installing dependencies, and downloading model weights from Hugging Face.
- Evaluation results show Lance competes with or outperforms larger models in various benchmarks despite its compact size.