mackenzietechdocs commited on
Commit
2a86ac6
·
verified ·
1 Parent(s): cbc4f97

Add model-index evaluation metadata for LLaDA2.0-flash to README

Browse files

- Added a `model-index` block to the README, converting the existing benchmark table into structured evaluation metadata for LLaDA2.0-flash.
- The metrics are taken from the LLaDA2.0-flash column of the existing benchmark table.

Files changed (1) hide show
  1. README.md +95 -0
README.md CHANGED
@@ -6,6 +6,101 @@ tags:
6
  - diffusion
7
  - llm
8
  - text_generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  ---
10
  # LLaDA2.0-flash
11
 
 
6
  - diffusion
7
  - llm
8
  - text_generation
9
+ model-index:
10
+ - name: LLaDA2.0-flash
11
+ results:
12
+ - task:
13
+ name: Text Generation
14
+ type: text-generation
15
+ dataset:
16
+ name: Benchmarks
17
+ type: benchmarks
18
+ metrics:
19
+ - name: Average
20
+ type: average
21
+ value: 79.32
22
+
23
+ # Knowledge
24
+ - name: MMLU
25
+ type: mmlu
26
+ value: 87.69
27
+ - name: MMLU-Pro
28
+ type: mmlu-pro
29
+ value: 73.36
30
+ - name: GPQA
31
+ type: gpqa
32
+ value: 61.98
33
+ - name: ARC-C
34
+ type: arc-c
35
+ value: 95.93
36
+ - name: CMMLU
37
+ type: cmmlu
38
+ value: 85.13
39
+ - name: C-EVAL
40
+ type: c-eval
41
+ value: 86.75
42
+ - name: GAOKAO-Bench
43
+ type: gaokao-bench
44
+ value: 93.90
45
+
46
+ # Reasoning
47
+ - name: SQuAD 2.0
48
+ type: squad-v2
49
+ value: 90.00
50
+ - name: DROP
51
+ type: drop
52
+ value: 87.90
53
+ - name: KOR-Bench
54
+ type: kor-bench
55
+ value: 64.24
56
+ - name: HellaSwag
57
+ type: hellaswag
58
+ value: 84.97
59
+
60
+ # Coding
61
+ - name: CRUXEval-O
62
+ type: cruxeval-o
63
+ value: 85.12
64
+ - name: MBPP
65
+ type: mbpp
66
+ value: 88.29
67
+ - name: MultiPL-E
68
+ type: multipl-e
69
+ value: 74.87
70
+ - name: HumanEval
71
+ type: humaneval
72
+ value: 94.51
73
+ - name: Bigcodebench-Full
74
+ type: bigcodebench-full
75
+ value: 41.58
76
+ - name: LiveCodeBench
77
+ type: livecodebench
78
+ value: 42.29
79
+ - name: Spider
80
+ type: spider
81
+ value: 82.49
82
+
83
+ # Math
84
+ - name: GSM8K
85
+ type: gsm8k
86
+ value: 96.06
87
+ - name: MATH
88
+ type: math
89
+ value: 95.44
90
+ - name: OlympiadBench
91
+ type: olympiadbench
92
+ value: 74.07
93
+ - name: AIME 2025
94
+ type: aime-2025
95
+ value: 60.00
96
+
97
+ # Agent & Alignment
98
+ - name: BFCL_Live
99
+ type: bfcl_live
100
+ value: 75.43
101
+ - name: IFEval-strict -prompt
102
+ type: ifeval-strict
103
+ value: 81.70
104
  ---
105
  # LLaDA2.0-flash
106