Hasty Briefsbeta

  • #Vision
  • #AI
  • #Gemini
  • Agentic Vision in Gemini 3 Flash transforms image understanding into an active, agentic process.
  • It combines visual reasoning with code execution to zoom in, inspect, and manipulate images step-by-step.
  • Agentic Vision introduces a Think, Act, Observe loop for image tasks.
  • Think: The model formulates a multi-step plan based on the query and initial image.
  • Act: It generates and executes Python code to manipulate or analyze images.
  • Observe: The transformed image is appended to the context window for better inspection.
  • Code execution with Gemini 3 Flash improves vision benchmarks by 5-10%.
  • Use cases include zooming and inspecting, image annotation, and visual math/plotting.
  • PlanCheckSolver.com improved accuracy by 5% using Agentic Vision for building plan validation.
  • Gemini 3 Flash can annotate images by drawing bounding boxes and labels for precise understanding.
  • It performs visual math by parsing tables and generating plots via Python code.
  • Future updates aim to make more behaviors implicit and expand tools and model sizes.
  • Agentic Vision is available via the Gemini API in Google AI Studio and Vertex AI.