TraceGen Benchmark Leaderboard

TraceGen Benchmark Overview

Benchmark: TraceGen Evaluation Suite

We evaluate models on 5 environments using the official TraceGen metrics. Each environment reports MSE, MAE, and Endpoint MSE on held-out test sets.

Test on TraceGen benchmark

Use the official evaluation code provided in: https://github.com/jayLEE0301/TraceGen

Multi-GPU

export CUDA_VISIBLE_DEVICES=0,1,2,3
torchrun --standalone --nproc_per_node=4 \
  test_benchmark.py \
  --config cfg/train.yaml \
  --override \
  train.batch_size=8 \
  train.lr_decoder=1.5e-4 \
  model.decoder.num_layers=6 \
  model.decoder.num_attention_heads=12 \
  model.decoder.latent_dim=768 \
  data.num_workers=4 \
  hardware.mixed_precision=true \
  logging.use_wandb=true \
  logging.log_every=2000 \
  --resume {path_to_pretrained_checkpoint}

Single-GPU

export CUDA_VISIBLE_DEVICES=0
python test_benchmark.py \
  --config cfg/train.yaml \
  --override \
  train.batch_size=8 \
  train.lr_decoder=1.5e-4 \
  model.decoder.num_layers=6 \
  model.decoder.num_attention_heads=12 \
  model.decoder.latent_dim=768 \
  data.num_workers=4 \
  hardware.mixed_precision=true \
  logging.use_wandb=true \
  logging.log_every=2000 \
  --resume {path_to_pretrained_checkpoint}

To reproduce the environment-specific benchmark results reported below, users should evaluate the environment-specific checkpoints TraceGen_{EnvName} from TraceGen Collection, which are trained using data from the corresponding environment only.

Metric definition. All reported errors are computed in a normalized coordinate space: both input images and predicted traces are scaled to the range [0, 1] prior to evaluation. Accordingly, the reported MSE, MAE, and Endpoint MSE reflect absolute errors within the normalized image space.

Environment Metric TraceGen (×1e−2)
EpicKitchen MSE 0.445
MAE 2.721
Endpoint MSE 0.791
Droid MSE 0.206
MAE 1.289
Endpoint MSE 0.285
Bridge MSE 0.653
MAE 2.419
Endpoint MSE 0.607
Libero MSE 0.276
MAE 1.442
Endpoint MSE 0.385
Robomimic MSE 0.138
MAE 1.416
Endpoint MSE 0.151

Submitting to the Leaderboard

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including furonghuang-lab/TraceGenBenchmark