Qwen2.5-VL (Vision-Language)

Qwen2.5-VL introduces native visual localization, structured output generation, and dynamic tool orchestration capabilities. Key features include 1+ hour video understanding, visual grounding with bounding boxes/points, and efficient dynamic resolution ViT encoder. Available in 3B, 7B, and 72B parameter sizes.

Visit Resource