Hasty Briefsbeta

Bilingual

Lance – native supports image and video understanding, generation, and editing

3 hours ago
  • #lightweight-model
  • #multimodal-ai
  • #video-generation
  • Lance is a lightweight unified multimodal model supporting image/video understanding, generation, and editing in a single framework.
  • It operates efficiently at 3 billion active parameters, trained from scratch with a multi-task recipe on a budget of 128 A100 GPUs.
  • The model demonstrates strong performance in benchmarks for image generation (e.g., GenEval, DPG), image editing (GEdit-Bench), and video generation (VBench).
  • Usage includes CLI for tasks like text-to-image/video generation, editing, and understanding, with examples provided in JSON configs.
  • Installation involves cloning the repository, setting up a Conda environment, installing dependencies, and downloading model weights from Hugging Face.
  • Evaluation results show Lance competes with or outperforms larger models in various benchmarks despite its compact size.