Model Card for Model ID

This is a saved checkpoint from fine-tuning a Qwen3/Qwen3-4B-Base model using the MaxRL objective, "Maximum Likelihood Reinforcement Learning". In our work, we introduce MaxRL, a framework for optimizing maximum likelihood in RL settings.

Model Details

Model Description

This is the model card of a Qwen3/Qwen3-4B-Base model fine-tuned using MaxRL.

Finetuned from model: Qwen3/Qwen3-4B-Base

Model Sources

Repository: Official Code Release for the paper "Maximum Likelihood Reinforcement Learning"
Paper: Maximum Likelihood Reinforcement Learning
Project Website: Project Website

Training Details

Training Data

We train on the POLARIS-53K dataset to produce this checkpoint.

Training Procedure

Please use the given script or in general the published codebase to reproduce training this checkpoint. Hyperparameters and other details are provided in the training script.

Due to computational constraints, we have trained for 1000 steps, and released the final checkpoint.

Hardware

This model has been finetuned using 32 NVIDIA H200 GPUs (4 nodes of 8xH200 GPUs).

Citation

BibTeX:

@misc{tajwar2026maximumlikelihoodreinforcementlearning,
      title={Maximum Likelihood Reinforcement Learning}, 
      author={Fahim Tajwar and Guanning Zeng and Yueer Zhou and Yuda Song and Daman Arora and Yiding Jiang and Jeff Schneider and Ruslan Salakhutdinov and Haiwen Feng and Andrea Zanette},
      year={2026},
      eprint={2602.02710},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2602.02710}, 
}

Model Card Contact

Fahim Tajwar

Downloads last month: 20

Safetensors

Model size

4B params

Tensor type

BF16

Collection including ftajwar/qwen3_4B_Base_MaxRL_Polaris_1000_steps

MaxRL

Collection

Qwen3-Base post-trained checkpoints for our paper, Maximum Likelihood Reinforcement Learning [https://zanette-labs.github.io/MaxRL/] • 4 items • Updated 6 days ago • 2

Paper for ftajwar/qwen3_4B_Base_MaxRL_Polaris_1000_steps

Maximum Likelihood Reinforcement Learning

Paper • 2602.02710 • Published 29 days ago • 3