Add model-index with benchmark evaluations

#6
by davidlms - opened

Added structured evaluation results from benchmark comparison (36 benchmarks):

General VQA:

  • MMBench V1.1: 88.8, MMBench V1.1 (CN): 88.2, MMStar: 75.9, BLINK (Val): 65.5, MUIRBENCH: 77.1

Multimodal Reasoning:

  • MMMU (Val): 76.0, MMMU_Pro: 66.0, VideoMMU: 74.7, MathVista: 85.2, AI2D: 88.8, DynaMath: 54.5, WeMath: 69.8, ZeroBench (sub): 25.8

Multimodal Agentic:

  • MMBrowseComp: 7.6, Design2Code: 88.6, Flame-React-Eval: 86.3, OSWorld: 37.2, AndroidWorld: 57.0, WebVoyager: 81.0, Webquest-SingleQA: 79.5, Webquest-MultiQA: 59.0

Multimodal Long Context:

  • MMLongBench-Doc: 54.9, MMLongBench-128K: 64.1, LVBench: 59.5

OCR & Chart:

  • OCRBench: 86.5, OCR-Bench_v2 (EN): 65.1, OCR-Bench_v2 (CN): 59.6, ChartQAPro: 65.5, ChartMuseum: 58.4, CharXiv_Val-Reasoning: 63.2

Spatial & Grounding:

  • OmniSpatial: 52.0, RefCOCO-avg (val): 88.6, TreeBench: 51.4, Ref-L4-test: 88.9

This enables the model to appear in leaderboards and makes it easier to compare with other models.

davidlms changed pull request status to closed

Sign up or log in to comment