TraceGen Benchmark Leaderboard

Benchmark: TraceGen Evaluation Suite

We evaluate models on 5 environments using the official TraceGen metrics. Each environment reports MSE, MAE, and Endpoint MSE on held-out test sets.

Test on TraceGen benchmark

Use the official evaluation code provided in: https://github.com/jayLEE0301/TraceGen

Multi-GPU

export CUDA_VISIBLE_DEVICES=0,1,2,3
torchrun --standalone --nproc_per_node=4 \
  test_benchmark.py \
  --config cfg/train.yaml \
  --override \
  train.batch_size=8 \
  train.lr_decoder=1.5e-4 \
  model.decoder.num_layers=6 \
  model.decoder.num_attention_heads=12 \
  model.decoder.latent_dim=768 \
  data.num_workers=4 \
  hardware.mixed_precision=true \
  logging.use_wandb=true \
  logging.log_every=2000 \
  --resume {path_to_pretrained_checkpoint}

Single-GPU

export CUDA_VISIBLE_DEVICES=0
python test_benchmark.py \
  --config cfg/train.yaml \
  --override \
  train.batch_size=8 \
  train.lr_decoder=1.5e-4 \
  model.decoder.num_layers=6 \
  model.decoder.num_attention_heads=12 \
  model.decoder.latent_dim=768 \
  data.num_workers=4 \
  hardware.mixed_precision=true \
  logging.use_wandb=true \
  logging.log_every=2000 \
  --resume {path_to_pretrained_checkpoint}

To reproduce the environment-specific benchmark results reported below, users should evaluate the environment-specific checkpoints TraceGen_{EnvName} from TraceGen Collection, which are trained using data from the corresponding environment only.

Metric definition. All reported errors are computed in a normalized coordinate space: both input images and predicted traces are scaled to the range [0, 1] prior to evaluation. Accordingly, the reported MSE, MAE, and Endpoint MSE reflect absolute errors within the normalized image space.

Environment	Metric	TraceGen (×1e−2)
EpicKitchen	MSE	0.445
	MAE	2.721
	Endpoint MSE	0.791
Droid	MSE	0.206
	MAE	1.289
	Endpoint MSE	0.285
Bridge	MSE	0.653
	MAE	2.419
	Endpoint MSE	0.607
Libero	MSE	0.276
	MAE	1.442
	Endpoint MSE	0.385
Robomimic	MSE	0.138
	MAE	1.416
	Endpoint MSE	0.151

Submitting to the Leaderboard

Use the provided evaluation script:
https://github.com/jayLEE0301/TraceGen
Report metrics on the official test split, using the corresponding dataset from:
https://huggingface.co/collections/furonghuang-lab/tracegen
For environment-specific results, evaluate the corresponding TraceGen_{EnvName} checkpoint.
Open a PR or submit results via GitHub Issues.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including furonghuang-lab/TraceGenBenchmark

TraceGen

Collection

TraceGen: World Modeling in 3D Trace-Space Enables Learning from Cross-Embodiment Videos • 13 items • Updated Jan 6 • 2