Model Zoo
We provide pre-trained models for view synthesis with 3D Gaussian splatting and scale-consistent depth estimation from multi-view posed images.
We assume that the downloaded weights are stored in the
pretraineddirectory. It's recommended to create a symbolic link fromYOUR_MODEL_PATHtopretrainedusing
ln -s YOUR_MODEL_PATH pretrained
- To verify the integrity of downloaded files, each model on this page includes its sha256sum prefix in the file name, which can be checked using the command
sha256sum filename.
Gaussian Splatting
The models are trained on RealEstate10K (re10k) and/or DL3DV (dl3dv) datasets at resolutions of 256x256, 256x448, and 448x768. The number of training views ranges from 2 to 10.
The "β" symbol indicates that the models are trained in two stages. For example, "re10k β (re10k+dl3dv)" means the model is firstly trained on the RealEstate10K dataset and then fine-tuned using a combination of the RealEstate10K and DL3DV datasets.
| Model | Training Data | Training Resolution | Training Views | Params (M) | Download |
|---|---|---|---|---|---|
| depthsplat-gs-small-re10k-256x256-view2-cfeab6b1.pth | re10k | 256x256 | 2 | 37 | download |
| depthsplat-gs-base-re10k-256x256-view2-ca7b6795.pth | re10k | 256x256 | 2 | 117 | download |
| depthsplat-gs-large-re10k-256x256-view2-e0f0f27a.pth | re10k | 256x256 | 2 | 360 | download |
| depthsplat-gs-base-re10k-256x448-view2-fea94f65.pth | re10k | 256x448 | 2 | 117 | download |
| depthsplat-gs-base-dl3dv-256x448-randview2-6-02c7b19d.pth | re10k β dl3dv | 256x448 | 2-6 | 117 | download |
| depthsplat-gs-small-re10kdl3dv-448x768-randview4-10-c08188db.pth | re10k β (re10k+dl3dv) | 256x448 β448x768 | 4-10 | 37 | download |
| depthsplat-gs-base-re10kdl3dv-448x768-randview2-6-f8ddd845.pth | re10k β (re10k+dl3dv) | 256x448 β448x768 | 2-6 | 117 | download |
Depth Prediction
- The depth models are trained with the following procedure:
- Initialize the monocular feature with Depth Anything V2 and the multi-view Transformer with UniMatch.
- Train the full DepthSplat model end-to-end on the mixed RealEstate10K and DL3DV datasets.
- Fine-tune the pre-trained depth model on the depth datasets with ground truth depth supervision. The depth datasets used for fine-tuning include ScanNet, TartanAir, and VKITTI2.
- The depth models are fine-tuned with random numbers (2-8) of input images, and the training image resolution is 352x640.
- The scale of the predicted depth is aligned with the scale of camera pose's translation.
| Model | Training Data | Training Resolution | Training Views | Params (M) | Download |
|---|---|---|---|---|---|
| depthsplat-depth-small-352x640-randview2-8-e807bd82.pth | (re10k+dl3dv) β (scannet+tartanair+vkitti2) | 448x768 β 352x640 | 2-8 | 36 | download |
| depthsplat-depth-base-352x640-randview2-8-65a892c5.pth | (re10k+dl3dv) β (scannet+tartanair+vkitti2) | 448x768 β 352x640 | 2-8 | 111 | download |