Model Summary

UnifiedReward-Flex-qwen35-27b is a unified personalized reward model for vision generation that couples reward modeling with flexible and context-adaptive reasoning!!

For further details, please refer to the following resources:

vLLM Server Deployment

export VLLM_DISABLE_FLASHINFER_GDN_PREFILL=1
export TOKENIZERS_PARALLELISM=false
vllm serve CodeGoat24/UnifiedReward-Flex-qwen35-27b \
 --host localhost \
 --port 8080 \
 --trust-remote-code \
 --served-model-name UnifiedReward \
 --gpu-memory-utilization 0.95 \
 --mm-encoder-tp-mode data \
 --mm-processor-cache-type shm \
 --enable-prefix-caching \
 --tensor-parallel-size 8 \
 --default-chat-template-kwargs '{"enable_thinking": false}'

The inference code is provided here.

Citation

@article{unifiedreward-flex,
  title={Unified Personalized Reward Model for Vision Generation},
  author={Wang, Yibin and Zang, Yuhang and Han, Feng and Bu, Jiazi and Zhou, Yujie and Jin, Cheng and Wang, Jiaqi},
  journal={arXiv preprint arXiv:2602.02380},
  year={2026}
}
Downloads last month
24
Safetensors
Model size
3.05M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for CodeGoat24/UnifiedReward-Flex-qwen35-27b

Base model

Qwen/Qwen3.5-27B
Finetuned
(1)
this model
Quantizations
2 models

Dataset used to train CodeGoat24/UnifiedReward-Flex-qwen35-27b

Collection including CodeGoat24/UnifiedReward-Flex-qwen35-27b

Paper for CodeGoat24/UnifiedReward-Flex-qwen35-27b