allenai/Molmo2-8B
Image-Text-to-Text
•
9B
•
Updated
•
73.4k
•
•
149
Video analysis models for action recognition, temporal understanding, and video content classification