LLaDA2.0-flash / README.md
mackenzietechdocs's picture
Add model-index evaluation metadata for LLaDA2.0-flash to README
2a86ac6 verified
|
raw
history blame
8.57 kB
metadata
license: apache-2.0
library_name: transformers
tags:
  - dllm
  - diffusion
  - llm
  - text_generation
model-index:
  - name: LLaDA2.0-flash
    results:
      - task:
          name: Text Generation
          type: text-generation
        dataset:
          name: Benchmarks
          type: benchmarks
        metrics:
          - name: Average
            type: average
            value: 79.32
          - name: MMLU
            type: mmlu
            value: 87.69
          - name: MMLU-Pro
            type: mmlu-pro
            value: 73.36
          - name: GPQA
            type: gpqa
            value: 61.98
          - name: ARC-C
            type: arc-c
            value: 95.93
          - name: CMMLU
            type: cmmlu
            value: 85.13
          - name: C-EVAL
            type: c-eval
            value: 86.75
          - name: GAOKAO-Bench
            type: gaokao-bench
            value: 93.9
          - name: SQuAD 2.0
            type: squad-v2
            value: 90
          - name: DROP
            type: drop
            value: 87.9
          - name: KOR-Bench
            type: kor-bench
            value: 64.24
          - name: HellaSwag
            type: hellaswag
            value: 84.97
          - name: CRUXEval-O
            type: cruxeval-o
            value: 85.12
          - name: MBPP
            type: mbpp
            value: 88.29
          - name: MultiPL-E
            type: multipl-e
            value: 74.87
          - name: HumanEval
            type: humaneval
            value: 94.51
          - name: Bigcodebench-Full
            type: bigcodebench-full
            value: 41.58
          - name: LiveCodeBench
            type: livecodebench
            value: 42.29
          - name: Spider
            type: spider
            value: 82.49
          - name: GSM8K
            type: gsm8k
            value: 96.06
          - name: MATH
            type: math
            value: 95.44
          - name: OlympiadBench
            type: olympiadbench
            value: 74.07
          - name: AIME 2025
            type: aime-2025
            value: 60
          - name: BFCL_Live
            type: bfcl_live
            value: 75.43
          - name: IFEval-strict -prompt
            type: ifeval-strict
            value: 81.7

LLaDA2.0-flash

LLaDA2.0-flash is a diffusion language model featuring a 100BA6B Mixture-of-Experts (MoE) architecture. As an enhanced, instruction-tuned iteration of the LLaDA2.0 series, it is optimized for practical applications.


Benchmark Qwen3-30B-A3B-Instruct-2507 Ling-flash-2.0 LLaDA2.0-flash-preview LLaDA2.0-flash
Average 79.47 78.03 71.92 79.32
Knowledge
MMLU 87.13 87.98 83.15 87.69
MMLU-Pro 74.23 76.84 49.22 73.36
GPQA 57.34 67.12 46.59 61.98
arc-c 95.81 95.08 93.90 95.93
CMMLU 86.36 86.59 67.53 85.13
C-EVAL 88.17 88.03 66.54 86.75
GAOKAO-Bench 94.53 93.24 86.12 93.90
Reasoning
SQuAD 2.0 89.51 81.32 85.61 90.00
DROP 87.57 88.32 79.49 87.90
KOR-Bench 68.00 68.96 37.26 64.24
HellaSwag 86.31 81.59 86.00 84.97
Coding
CRUXEval-O 86.75 82.75 61.88 85.12
MBPP 86.65 85.01 77.75 88.29
MultiPL-E 70.67 65.76 62.43 74.87
HumanEval 93.29 85.98 80.49 94.51
Bigcodebench-Full 41.49 40.70 30.44 41.58
LiveCodeBench 41.63 44.11 28.58 42.29
Spider 81.79 80.58 81.37 82.49
Math
GSM8K 96.36 95.45 89.01 96.06
MATH 96.70 96.1 73.50 95.44
OlympiadBench 77.59 76.19 47.78 74.07
AIME 2025 61.88 55.89 23.33 60.00
Agent & Alignment
BFCL_Live 73.19 67.57 74.11 75.43
IFEval-strict -prompt 84.29 81.52 62.50 81.70

πŸš€ Performance Highlights

  • Leading MoE Architecture: The open-source Mixture-of-Experts (MoE) diffusion large language model continually trained on the Ling2.0 series with approximately 20 trillion tokens.
  • Efficient Inference: With 100 billion total parameters, only 6.1 billion are activated during inference. LLaDA2.0-flash significantly reduces computational costs while outperforming open-source dense models of similar scale.
  • Impressive Performance on Code & Complex Reasoning: Excels in tasks such as code generation and advanced mathematical reasoning, demonstrating strong reasoning capabilities.
  • Tool Use: Supports tool calling and achieves excellent performance in complex agent-based tasks.
  • Open & Extensible: Fully open-source with commitment to transparency. We plan to release a leading inference framework in the future and continue investing in cutting-edge areas like diffusion LLMs (dLLM) to drive disruptive innovation.

πŸ—ΊοΈ What's Next

  • Supercharged Reasoning with LLaDA 2.0: LLaDA 2.0 series will be fine-tuned with Reinforcement Learning, unlocking a new level of sophisticated reasoning and problem-solving abilities.
  • Tools for Innovators: The model was finetuned on the dFactory framework using Fully Sharded Data Parallel (FSDP2). We have begun open-sourcing dFactory and will continuously release our advanced post-training technologies. Whether you want to master the current model or build your own customized versions, you'll have the tools you need. Stay tuned for more updates!

πŸ“¦ Model Variants

Model ID Description Hugging Face Link
inclusionAI/LLaDA2.0-mini Instruction-tuned model, ready for downstream applications. πŸ€— Model Card
inclusionAI/LLaDA2.0-flash Instruction-tuned model, ready for downstream applications. πŸ€— Model Card

πŸ” Model Overview

LLaDA2.0-flash has the following specifications:

  • Type: Mixture-of-Experts (MoE) Diffusion Language Model
  • Total Parameters (Non-Embedding): 100B
  • Number of Layers: 32
  • Attention Heads: 32
  • Context Length: 32,768 tokens
  • Position Embedding: Rotary (RoPE)
  • Vocabulary Size: 157,184

πŸ€— Hugging Face Transformers

Make sure you have transformers and its dependencies installed:

import torch
import torch.nn.functional as F
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer

model_path = "/path/to/LLaDA2.0-mini-preview"
device = "auto"
model = AutoModelForCausalLM.from_pretrained(
    model_path, trust_remote_code=True, device_map=device
)
model = model.to(torch.bfloat16)
model.eval()
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

prompt = "Why does Camus think that Sisyphus is happy?"
input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}],
    add_generation_prompt=True,
    tokenize=True,
    return_tensors="pt",
)
generated_tokens = model.generate(
    inputs=input_ids,
    eos_early_stop=True,
    gen_length=512,
    block_length=32,
    steps=32,
    temperature=0.0,
)
generated_answer = tokenizer.decode(
    generated_tokens[0],
    skip_special_tokens=True,
)
print(generated_answer)

Best Practices

To achieve optimal performance, we recommend the following settings:

  1. Sampling Parameters: We suggest using Temperature=0.0, block_length=32, and steps=32. Using a higher temperature value may occasionally result in language mixing and a slight decrease in model performance.

  2. Adequate Output Length: We recommend using an output length of 32768 tokens for most queries.


🌐 License

This project is licensed under the terms of the Apache License 2.0.


🀝 Contact & Collaboration

For questions, collaborations, or feedback, please reach out via Hugging Face or open an issue in the repository.

πŸ‘‰ Join us in advancing open, efficient, and intelligent language models!