Spaces:

depth-anything
/

depth-anything-3

Running on Zero

App Files Files Community

linhaotong commited on 25 days ago

Commit

4845d25

1 Parent(s): b4fbfcd

update

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

DEPLOYMENT_CHECKLIST.md +323 -0
GSPLAT_SOLUTIONS.md +348 -0
HF_SPACES_BUILD.md +306 -0
PYTHON_VERSION_CONFIG.md +290 -0
README.md +1 -0
SPACES_SETUP.md +190 -0
app.py +73 -0
example_spaces_gpu.py +52 -0
packages.txt +3 -0
pyproject.toml +93 -0
requirements-basic.txt +41 -0
requirements.txt +38 -0
runtime.txt +2 -0
src/depth_anything_3/__pycache__/api.cpython-311.pyc +0 -0
src/depth_anything_3/__pycache__/cfg.cpython-311.pyc +0 -0
src/depth_anything_3/__pycache__/cli.cpython-311.pyc +0 -0
src/depth_anything_3/__pycache__/registry.cpython-311.pyc +0 -0
src/depth_anything_3/__pycache__/specs.cpython-311.pyc +0 -0
src/depth_anything_3/api.py +414 -0
src/depth_anything_3/app/__pycache__/css_and_html.cpython-311.pyc +0 -0
src/depth_anything_3/app/__pycache__/gradio_app.cpython-311.pyc +0 -0
src/depth_anything_3/app/css_and_html.py +594 -0
src/depth_anything_3/app/gradio_app.py +747 -0
src/depth_anything_3/app/modules/__init__.py +45 -0
src/depth_anything_3/app/modules/__pycache__/__init__.cpython-311.pyc +0 -0
src/depth_anything_3/app/modules/__pycache__/event_handlers.cpython-311.pyc +0 -0
src/depth_anything_3/app/modules/__pycache__/file_handlers.cpython-311.pyc +0 -0
src/depth_anything_3/app/modules/__pycache__/model_inference.cpython-311.pyc +0 -0
src/depth_anything_3/app/modules/__pycache__/ui_components.cpython-311.pyc +0 -0
src/depth_anything_3/app/modules/__pycache__/utils.cpython-311.pyc +0 -0
src/depth_anything_3/app/modules/__pycache__/visualization.cpython-311.pyc +0 -0
src/depth_anything_3/app/modules/event_handlers.py +629 -0
src/depth_anything_3/app/modules/file_handlers.py +304 -0
src/depth_anything_3/app/modules/model_inference.py +286 -0
src/depth_anything_3/app/modules/ui_components.py +474 -0
src/depth_anything_3/app/modules/utils.py +211 -0
src/depth_anything_3/app/modules/visualization.py +434 -0
src/depth_anything_3/cfg.py +144 -0
src/depth_anything_3/cli.py +742 -0
src/depth_anything_3/configs/da3-base.yaml +45 -0
src/depth_anything_3/configs/da3-giant.yaml +71 -0
src/depth_anything_3/configs/da3-large.yaml +45 -0
src/depth_anything_3/configs/da3-small.yaml +45 -0
src/depth_anything_3/configs/da3metric-large.yaml +28 -0
src/depth_anything_3/configs/da3mono-large.yaml +28 -0
src/depth_anything_3/configs/da3nested-giant-large.yaml +10 -0
src/depth_anything_3/model/__init__.py +20 -0
src/depth_anything_3/model/__pycache__/__init__.cpython-311.pyc +0 -0
src/depth_anything_3/model/__pycache__/cam_dec.cpython-311.pyc +0 -0
src/depth_anything_3/model/__pycache__/cam_enc.cpython-311.pyc +0 -0

DEPLOYMENT_CHECKLIST.md ADDED Viewed

	@@ -0,0 +1,323 @@

+# 🚀 Hugging Face Spaces 部署检查清单
+## ✅ 当前配置状态
+### 核心文件（必需）
+- ✅ **app.py** - 入口文件，带 `@spaces.GPU` 装饰器
+- ✅ **requirements.txt** - Python 依赖（包含 gsplat）
+- ✅ **README.md** - Space 配置（Python 3.11）
+- ✅ **packages.txt** - 系统依赖（build-essential, git）
+- ✅ **pyproject.toml** - 项目配置
+### 备用文件（可选）
+- ✅ **requirements-basic.txt** - 不包含 gsplat 的版本（如果构建失败）
+- ✅ **runtime.txt** - Python 版本备用配置
+- ✅ **GSPLAT_SOLUTIONS.md** - gsplat 问题解决方案
+- ✅ **SPACES_SETUP.md** - 详细部署指南
+---
+## 📋 部署前检查
+### 1. 文件检查
+```bash
+# 确认所有必需文件存在
+[ -f app.py ] && echo "✅ app.py" || echo "❌ app.py missing"
+[ -f requirements.txt ] && echo "✅ requirements.txt" || echo "❌ requirements.txt missing"
+[ -f README.md ] && echo "✅ README.md" || echo "❌ README.md missing"
+[ -d src/depth_anything_3 ] && echo "✅ Source code" || echo "❌ Source code missing"
+```
+### 2. 配置检查
+**README.md 必须包含：**
+```yaml
+---
+sdk: gradio
+app_file: app.py
+python_version: 3.11
+---
+```
+**requirements.txt 必须包含：**
+```txt
+torch>=2.0.0
+gradio>=5.0.0
+spaces
+gsplat @ git+https://...  # 如果需要 3DGS
+```
+**app.py 必须包含：**
+```python
+import spaces
+@spaces.GPU(duration=120)
+def gpu_run_inference(self, *args, **kwargs):
+    ...
+```
+### 3. 本地测试（推荐）
+```bash
+# 测试 Python 版本
+python --version  # 应该是 3.11+
+# 测试安装依赖
+pip install -r requirements.txt
+# 测试应用启动
+python app.py
+# 测试 gsplat（如果需要）
+python -c "import gsplat; print('✅ gsplat OK')"
+```
+---
+## 🎯 部署步骤
+### 方式 A：通过网页界面
+1. **创建 Space**
+   - 访问 https://huggingface.co/new-space
+   - Space name: 输入名称
+   - SDK: 选择 **Gradio**
+   - Hardware: 选择 **GPU (T4 或更高)**
+   - Visibility: Public/Private
+2. **上传文件**
+   - 上传所有文件（app.py, requirements.txt, src/, 等）
+   - 或者通过 Git 克隆上传
+3. **等待构建**
+   - 查看 "Build logs" 标签
+   - 首次构建可能需要 10-20 分钟（因为 gsplat）
+4. **测试应用**
+   - 构建成功后自动启动
+   - 测试所有功能
+### 方式 B：通过 Git
+```bash
+# 1. 创建 Space（通过网页）
+# 2. 克隆 Space 仓库
+git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
+cd YOUR_SPACE_NAME
+# 3. 复制文件
+cp -r /path/to/depth-anything-3/* .
+# 4. 提交并推送
+git add .
+git commit -m "Initial deployment"
+git push
+# 5. 查看构建日志
+# 在网页界面查看
+```
+---
+## 🐛 常见问题快速解决
+### 问题 1：gsplat 构建失败 ⚠️
+**症状：**
+```
+Building wheel for gsplat (setup.py) ... error
+```
+**快速修复：**
+```bash
+# 方法 1: 切换到不含 gsplat 的版本
+mv requirements.txt requirements-full.txt
+mv requirements-basic.txt requirements.txt
+git commit -am "Use basic requirements without gsplat"
+git push
+```
+**或者在网页界面：**
+1. 打开 requirements.txt
+2. 注释掉 gsplat 那行：`# gsplat @ git+...`
+3. 提交更改
+详见：`GSPLAT_SOLUTIONS.md`
+### 问题 2：构建超时
+**症状：**
+```
+Build timeout after 60 minutes
+```
+**解决方法：**
+1. 使用 requirements-basic.txt（不含 gsplat）
+2. 或者联系 HF 支持增加构建时间限制
+### 问题 3：应用启动失败
+**症状：**
+```
+ModuleNotFoundError: No module named 'depth_anything_3'
+```
+**解决方法：**
+1. 确认 `src/` 目录结构正确
+2. 在 app.py 开头添加：
+   ```python
+   import sys
+   sys.path.append('./src')
+   ```
+### 问题 4：GPU 不可用
+**症状：**
+```
+torch.cuda.is_available() = False
+```
+**解决方法：**
+1. 确认 Space 硬件选择了 **GPU**（不是 CPU）
+2. 在 Settings 中切换到 GPU 硬件
+3. 可能需要付费 GPU（T4 是最便宜的）
+---
+## 📊 构建时间预估
+| 配置 | 首次构建 | 后续构建 | 启动时间 |
+|------|---------|---------|---------|
+| 含 gsplat | 15-25 分钟 | 2-5 分钟* | 30-60 秒 |
+| 不含 gsplat | 5-10 分钟 | 1-2 分钟* | 20-40 秒 |
+*后续构建可能使用缓存
+---
+## 🎓 部署后测试清单
+### 基础功能
+- [ ] 应用成功启动
+- [ ] 可以访问 Space URL
+- [ ] UI 正常显示
+- [ ] 可以上传图片/视频
+### 深度估计功能
+- [ ] 可以运行深度估计
+- [ ] 结果正确显示
+- [ ] Point Cloud 可视化正常
+- [ ] 相机姿态显示正常
+### 3DGS 功能（如果启用 gsplat）
+- [ ] 3DGS 选项可见
+- [ ] 可以生成 3DGS 视频
+- [ ] 视频可以播放
+### 性能测试
+- [ ] GPU 正确识别
+- [ ] 推理速度合理（不超时）
+- [ ] 内存使用正常
+---
+## 💾 配置文件快速参考
+### README.md
+```yaml
+---
+title: Depth Anything 3
+sdk: gradio
+sdk_version: 5.49.1
+app_file: app.py
+python_version: 3.11
+---
+```
+### app.py 关键部分
+```python
+import spaces
+from depth_anything_3.app.gradio_app import DepthAnything3App
+original_run_inference = ModelInference.run_inference
+@spaces.GPU(duration=120)
+def gpu_run_inference(self, *args, **kwargs):
+    return original_run_inference(self, *args, **kwargs)
+ModelInference.run_inference = gpu_run_inference
+if __name__ == "__main__":
+    app = DepthAnything3App(...)
+    app.launch(host="0.0.0.0", port=7860)
+```
+### requirements.txt 关键依赖
+```txt
+torch>=2.0.0
+gradio>=5.0.0
+spaces
+gsplat @ git+https://github.com/nerfstudio-project/gsplat.git@0b4dddf04cb687367602c01196913cde6a743d70
+```
+### packages.txt
+```txt
+build-essential
+git
+```
+---
+## 🔗 相关文档
+本项目的详细文档：
+1. **SPACES_SETUP.md** - 完整部署指南和 Spaces 机制说明
+2. **GSPLAT_SOLUTIONS.md** - gsplat 安装的各种解决方案
+3. **HF_SPACES_BUILD.md** - HF Spaces 构建流程详解
+4. **PYTHON_VERSION_CONFIG.md** - Python 版本配置说明
+外部资源：
+- [HF Spaces 文档](https://huggingface.co/docs/hub/spaces)
+- [Gradio 文档](https://gradio.app/docs)
+- [gsplat GitHub](https://github.com/nerfstudio-project/gsplat)
+---
+## 📞 获取帮助
+如果遇到问题：
+1. **查看构建日志** - Space 页面的 "Build logs" 标签
+2. **查看运行日志** - Space 页面的 "Logs" 标签
+3. **参考文档** - 本项目的 *.md 文档
+4. **HF 论坛** - https://discuss.huggingface.co/
+5. **GitHub Issues** - 项目的 Issues 页面
+---
+## ✨ 成功部署后
+恭喜！🎉 你的 Depth Anything 3 应用已经在 HF Spaces 上运行了！
+**下一步：**
+1. 📝 更新 README.md 添加使用说明
+2. 🎨 自定义 UI（如果需要）
+3. 📊 监控使用情况
+4. 🔄 根据反馈持续改进
+**分享你的 Space：**
+- Space URL: `https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME`
+- 可以嵌入到网页、博客等
+祝你使用愉快！🚀

GSPLAT_SOLUTIONS.md ADDED Viewed

	@@ -0,0 +1,348 @@

+# gsplat 安装解决方案
+## 🎯 问题描述
+`gsplat` 是一个 CUDA 加速的 3D Gaussian Splatting 库，从源码安装可能在 HF Spaces 遇到问题。
+## ✅ 解决方案（按推荐顺序）
+---
+## 方案 1️⃣：直接从 GitHub 安装 ⭐ (已配置)
+**requirements.txt:**
+```txt
+gsplat @ git+https://github.com/nerfstudio-project/gsplat.git@0b4dddf04cb687367602c01196913cde6a743d70
+```
+**优点：**
+- ✅ 使用特定版本，稳定
+- ✅ 最新功能
+- ✅ 与你的代码兼容
+**缺点：**
+- ⚠️ 构建时间长（5-15 分钟）
+- ⚠️ 需要 CUDA 在构建时
+- ⚠️ 可能构建失败
+**测试方法：**
+```bash
+# 本地测试（确保有 GPU）
+pip install 'gsplat @ git+https://github.com/nerfstudio-project/gsplat.git@0b4dddf04cb687367602c01196913cde6a743d70'
+python -c "import gsplat; print(gsplat.__version__)"
+```
+**HF Spaces 配置建议：**
+如果构建失败，需要在 Space 设置中：
+1. 选择 **GPU Space**（不是 CPU Space）
+2. GPU 类型选择至少 **T4** 或更高
+3. 在构建阶段就需要 GPU
+---
+## 方案 2️⃣：使用预编译 Wheel（如果可用）
+**检查是否有预编译版本：**
+```bash
+pip index versions gsplat
+```
+如果有 PyPI 版本，修改 requirements.txt：
+```txt
+# 使用 PyPI 版本（更快）
+gsplat>=0.1.0
+```
+**优点：**
+- ✅ 安装快速（秒级）
+- ✅ 不需要编译
+- ✅ 更稳定
+**缺点：**
+- ⚠️ 可能版本较旧
+- ⚠️ 可能没有预编译版本
+---
+## 方案 3️⃣：延迟加载 gsplat（推荐备用方案）⭐
+如果构建失败，修改代码让 gsplat 变成可选依赖：
+### 步骤 1：修改 requirements.txt
+创建两个文件：
+**requirements.txt** (基础依赖):
+```txt
+torch>=2.0.0
+gradio>=5.0.0
+spaces
+# ... 其他基础依赖
+```
+**requirements-gsplat.txt** (可选依赖):
+```txt
+-r requirements.txt
+gsplat @ git+https://github.com/nerfstudio-project/gsplat.git@0b4dddf04cb687367602c01196913cde6a743d70
+```
+### 步骤 2：修改代码，使 gsplat 可选
+**depth_anything_3/utils/export/gs.py** (或相关文件):
+```python
+# 在文件开头
+try:
+    import gsplat
+    GSPLAT_AVAILABLE = True
+except ImportError:
+    GSPLAT_AVAILABLE = False
+    print("⚠️ gsplat not installed. 3DGS features will be disabled.")
+def export_to_gs_video(*args, **kwargs):
+    if not GSPLAT_AVAILABLE:
+        raise RuntimeError(
+            "gsplat is not installed. Please install it with:\n"
+            "pip install 'gsplat @ git+https://github.com/...'"
+        )
+    # 原有代码...
+```
+**app.py** (或 gradio_app.py):
+```python
+from depth_anything_3.utils.export.gs import GSPLAT_AVAILABLE
+# 在 UI 中隐藏 3DGS 选项如果不可用
+if GSPLAT_AVAILABLE:
+    infer_gs = gr.Checkbox(label="Infer 3D Gaussian Splatting")
+else:
+    infer_gs = gr.Checkbox(
+        label="Infer 3D Gaussian Splatting (Not Available - gsplat not installed)",
+        interactive=False,
+        value=False
+    )
+```
+**优点：**
+- ✅ 应用仍然可以启动
+- ✅ 其他功能正常工作
+- ✅ 用户可以选择性安装
+**缺点：**
+- ⚠️ 需要修改代码
+- ⚠️ 3DGS 功能不可用
+---
+## 方案 4️⃣：使用 Docker 自定义构建
+创建自定义 Docker 镜像，在本地预编译 gsplat：
+**Dockerfile:**
+```dockerfile
+FROM pytorch/pytorch:2.1.0-cuda11.8-cudnn8-runtime
+WORKDIR /app
+# 安装构建依赖
+RUN apt-get update && apt-get install -y \
+    git \
+    build-essential \
+    && rm -rf /var/lib/apt/lists/*
+# 预编译 gsplat
+RUN pip install 'gsplat @ git+https://github.com/nerfstudio-project/gsplat.git@0b4dddf04cb687367602c01196913cde6a743d70'
+# 安装其他依赖
+COPY requirements.txt .
+RUN pip install -r requirements.txt
+# 复制代码
+COPY . .
+CMD ["python", "app.py"]
+```
+**优点：**
+- ✅ 完全控制构建环境
+- ✅ 可以缓存编译结果
+- ✅ 更可靠
+**缺点：**
+- ⚠️ 需要 Docker 知识
+- ⚠️ 镜像较大
+- ⚠️ 构建和推送时间长
+---
+## 方案 5️⃣：使用环境变量控制安装
+**requirements.txt:**
+```txt
+torch>=2.0.0
+gradio>=5.0.0
+# 基础依赖...
+```
+**安装脚本** (install_gsplat.sh):
+```bash
+#!/bin/bash
+if [ "$INSTALL_GSPLAT" = "true" ]; then
+    echo "Installing gsplat..."
+    pip install 'gsplat @ git+https://github.com/nerfstudio-project/gsplat.git@0b4dddf04cb687367602c01196913cde6a743d70'
+else
+    echo "Skipping gsplat installation"
+fi
+```
+在 HF Spaces 设置中添加环境变量：
+```
+INSTALL_GSPLAT=true
+```
+**优点：**
+- ✅ 灵活控制
+- ✅ 可以快速切换
+**缺点：**
+- ⚠️ 需要额外脚本
+- ⚠️ 不是标准方法
+---
+## 🔧 当前推荐配置
+### 第一次尝试：方案 1（已配置）✅
+**requirements.txt:**
+```txt
+gsplat @ git+https://github.com/nerfstudio-project/gsplat.git@0b4dddf04cb687367602c01196913cde6a743d70
+```
+**Space 设置：**
+- 硬件：**GPU (T4 或更高)**
+- Python 版本：3.11
+### 如果构建失败：方案 3（延迟加载）
+移除 requirements.txt 中的 gsplat，修改代码使其可选。
+---
+## 🐛 故障排除
+### 问题 1：构建超时
+**错误信息：**
+```
+Building wheels for collected packages: gsplat
+  Building wheel for gsplat (setup.py) ... [TIMEOUT]
+```
+**解决方法：**
+1. 确认 Space 类型是 **GPU Space**
+2. 尝试使用更快的 commit/tag
+3. 考虑方案 3（可选依赖）
+### 问题 2：CUDA 不可用
+**错误信息：**
+```
+torch.cuda.is_available() returned False
+CUDA extension build requires CUDA to be available
+```
+**解决方法：**
+1. 确认构建时就启用 GPU
+2. 检查 PyTorch 是否是 CUDA 版本
+3. 查看 [HF Spaces GPU 文档](https://huggingface.co/docs/hub/spaces-gpus)
+### 问题 3：编译错误
+**错误信息：**
+```
+error: command 'gcc' failed with exit status 1
+```
+**解决方法：**
+1. 添加 packages.txt 安装编译工具：
+   ```txt
+   build-essential
+   ```
+2. 使用预编译版本
+---
+## 📊 方案对比
+| 方案 | 构建时间 | 成功率 | 复杂度 | 推荐度 |
+|------|---------|--------|--------|--------|
+| 1. GitHub 直接安装 | 🐌 10-15分钟 | ⚠️ 70% | 简单 | ⭐⭐⭐ |
+| 2. PyPI 预编译 | ⚡ 1分钟 | ✅ 95% | 最简单 | ⭐⭐⭐⭐⭐ |
+| 3. 可选依赖 | ⚡ 2分钟 | ✅ 100% | 中等 | ⭐⭐⭐⭐ |
+| 4. Docker | 🐌 20-30分钟 | ✅ 95% | 复杂 | ⭐⭐ |
+| 5. 环境变量控制 | 🐌 10-15分钟 | ⚠️ 70% | 中等 | ⭐⭐ |
+---
+## 🎯 实施步骤
+### 现在（已完成）✅
+1. ✅ requirements.txt 中已启用 gsplat
+2. ✅ Python 版本设置为 3.11
+3. ✅ README.md 配置完成
+### 推送到 HF Spaces 后
+1. **观察构建日志**
+   - 查看是否成功安装 gsplat
+   - 构建时间是否合理
+2. **如果构建成功** 🎉
+   - 测试 3DGS 功能
+   - 完成！
+3. **如果构建失败** ⚠️
+   - 复制错误信息
+   - 根据上面的故障排除指南修复
+   - 或者切换到方案 3（可选依赖）
+---
+## 📝 测试清单
+部署前本地测试：
+```bash
+# 1. 测试 gsplat 安装
+pip install 'gsplat @ git+https://github.com/nerfstudio-project/gsplat.git@0b4dddf04cb687367602c01196913cde6a743d70'
+# 2. 测试导入
+python -c "import gsplat; print('gsplat version:', gsplat.__version__)"
+# 3. 测试你的代码
+python -c "from depth_anything_3.utils.export.gs import export_to_gs_video; print('✅ import success')"
+# 4. 启动应用测试
+python app.py
+```
+---
+## 🔗 相关资源
+- [gsplat GitHub](https://github.com/nerfstudio-project/gsplat)
+- [HF Spaces GPU 文档](https://huggingface.co/docs/hub/spaces-gpus)
+- [PyTorch CUDA 安装](https://pytorch.org/get-started/locally/)
+---
+## 💡 最终建议
+1. **先尝试方案 1**（当前配置）- 直接在 HF Spaces 上构建
+2. **如果失败**，切换到**方案 3**（可选依赖）- 让应用可以在没有 gsplat 的情况下运行
+3. **长期方案**：如果 gsplat 发布 PyPI 版本，立即切换到方案 2
+祝你部署顺利！🚀

HF_SPACES_BUILD.md ADDED Viewed

	@@ -0,0 +1,306 @@

+# Hugging Face Spaces 构建和环境安装详解
+## 🏗️ 构建流程概览
+```mermaid
+graph TD
+    A[推送代码到 Space] --> B[检测 SDK 类型]
+    B --> C[读取 README.md 配置]
+    C --> D[查找依赖文件]
+    D --> E{依赖文件类型}
+    E -->|requirements.txt| F[pip install -r requirements.txt]
+    E -->|pyproject.toml| G[pip install -e .]
+    E -->|packages.txt| H[apt-get install]
+    F --> I[启动应用]
+    G --> I
+    H --> I
+    I --> J[运行 app.py]
+```
+## 📋 步骤详解
+### 第 1 步：Space 配置检测
+HF Spaces 读取 `README.md` 的 YAML 前置内容：
+```yaml
+---
+title: Depth Anything 3
+emoji: 🏢
+colorFrom: indigo
+colorTo: pink
+sdk: gradio              # 🔑 关键：指定使用 Gradio SDK
+sdk_version: 5.49.1      # Gradio 版本
+app_file: app.py         # 🔑 关键：入口文件
+pinned: false
+license: cc-by-nc-4.0
+---
+```
+### 第 2 步：依赖文件优先级
+HF Spaces 按以下顺序查找依赖文件（找到第一个就使用）：
+#### 1. `requirements.txt` ⭐ (最推荐)
+```txt
+torch>=2.0.0
+gradio>=5.0.0
+spaces
+numpy<2
+```
+**安装命令：**
+```bash
+pip install -r requirements.txt
+```
+**优点：**
+- ✅ 简单直接
+- ✅ 构建速度快
+- ✅ 兼容性最好
+- ✅ 错误信息清晰
+#### 2. `pyproject.toml` (你当前使用的)
+```toml
+[project]
+dependencies = ["torch>=2", "numpy<2"]
+[project.optional-dependencies]
+app = ["gradio>=5", "spaces"]
+```
+**安装命令：**
+```bash
+pip install -e .
+# 或者包含 optional dependencies
+pip install -e ".[app]"
+```
+**问题：**
+- ⚠️ 可能不会自动安装 `[project.optional-dependencies]`
+- ⚠️ 需要正确的包结构（`src/` 目录等）
+- ⚠️ 构建时间较长
+#### 3. `packages.txt` (系统级依赖)
+```txt
+ffmpeg
+libsm6
+libxext6
+```
+**安装命令：**
+```bash
+apt-get update
+apt-get install -y ffmpeg libsm6 libxext6
+```
+**用途：**
+- 安装系统级库（非 Python 包）
+- OpenCV 可能需要的系统库
+- 音视频处理工具
+### 第 3 步：实际构建过程
+```bash
+# === HF Spaces 内部执行的命令（简化版） ===
+# 1. 准备环境
+export HOME=/home/user
+export PYTHONPATH=/home/user/app:$PYTHONPATH
+# 2. 安装 Python 基础环境
+python -m pip install --upgrade pip setuptools wheel
+# 3. 安装系统依赖（如果有 packages.txt）
+if [ -f packages.txt ]; then
+    apt-get update
+    xargs -a packages.txt apt-get install -y
+fi
+# 4. 安装 Python 依赖
+if [ -f requirements.txt ]; then
+    pip install -r requirements.txt
+elif [ -f pyproject.toml ]; then
+    pip install -e .
+fi
+# 5. 启动应用
+python app.py
+```
+## 🔍 你的项目构建分析
+### 当前问题：使用 pyproject.toml
+你的 `pyproject.toml` 配置：
+```toml
+[project]
+dependencies = [
+    "torch>=2",
+    "gradio",  # ❌ 这里没有 gradio！
+    # ...
+]
+[project.optional-dependencies]
+app = ["gradio>=5", "spaces"]  # ✅ gradio 在这里
+```
+**问题：**
+- HF Spaces 可能只安装 `dependencies`，不安装 `optional-dependencies`
+- 导致 `gradio` 和 `spaces` 可能不会被安装
+### 解决方案 1：使用 requirements.txt (推荐) ✅
+我已经为你创建了 `requirements.txt`，HF Spaces 会优先使用它：
+```bash
+# Spaces 会自动执行
+pip install -r requirements.txt
+```
+### 解决方案 2：修改 pyproject.toml
+将 gradio 移到主依赖：
+```toml
+[project]
+dependencies = [
+    "torch>=2",
+    "gradio>=5",
+    "spaces",
+    # ... 其他依赖
+]
+```
+### 解决方案 3：创建 .spacesrc
+创建 `.spacesrc` 文件自定义构建：
+```bash
+pip install -e ".[app,gs]"
+```
+## 🚀 推荐配置
+对于 HF Spaces 部署，推荐的文件结构：
+```
+depth-anything-3/
+├── app.py              # 入口文件
+├── requirements.txt    # Python 依赖（优先）
+├── packages.txt        # 系统依赖（可选）
+├── README.md          # Space 配置
+├── src/
+│   └── depth_anything_3/
+│       └── ...
+└── pyproject.toml     # 项目配置（备用）
+```
+## ⚡ 构建优化建议
+### 1. 固定版本号
+```txt
+# ❌ 不推荐（构建不稳定）
+torch>=2
+gradio>=5
+# ✅ 推荐（构建稳定）
+torch==2.1.0
+gradio==5.49.1
+```
+### 2. 预构建的 wheels
+使用 PyPI 有预构建 wheel 的版本，避免从源码编译：
+```txt
+# ✅ 快速安装
+torch==2.1.0
+torchvision==0.16.0
+# ⚠️ 慢（从源码编译）
+gsplat @ git+https://github.com/...
+```
+### 3. 使用 Docker（高级）
+创建自定义 Docker 镜像：
+```dockerfile
+FROM python:3.10
+WORKDIR /app
+COPY requirements.txt .
+RUN pip install -r requirements.txt
+COPY . .
+CMD ["python", "app.py"]
+```
+## 🐛 常见问题
+### Q1: 为什么构建失败？
+**检查清单：**
+1. ✅ 依赖文件是否存在？
+2. ✅ 版本号是否兼容？
+3. ✅ 是否需要系统依赖（packages.txt）？
+4. ✅ 包名是否正确？
+### Q2: 如何查看构建日志？
+在 Space 页面：
+1. 点击右上角 "Settings"
+2. 滚动到 "Build logs"
+3. 查看详细日志
+### Q3: 构建时间太长怎么办？
+**优化方法：**
+1. 使用 `requirements.txt` 而不是 `pyproject.toml`
+2. 移除不必要的依赖
+3. 使用预构建的 wheels
+4. 考虑使用 Docker 镜像缓存
+### Q4: 本地能运行，Spaces 上失败？
+**可能原因：**
+1. 缺少系统依赖（需要 packages.txt）
+2. 路径问题（本地是绝对路径）
+3. 环境变量不同
+4. Python 版本不同
+**解决方法：**
+```toml
+# README.md 中指定 Python 版本
+---
+sdk: gradio
+python_version: 3.10
+---
+```
+## 📊 构建时间参考
+| 依赖方式 | 平均构建时间 | 稳定性 |
+|---------|------------|--------|
+| requirements.txt | 2-5 分钟 | ⭐⭐⭐⭐⭐ |
+| pyproject.toml | 5-10 分钟 | ⭐⭐⭐ |
+| 从源码编译 | 10-30 分钟 | ⭐⭐ |
+## 🎯 最佳实践
+1. **使用 requirements.txt** 作为主要依赖管理
+2. **固定关键依赖的版本号**
+3. **测试本地环境** 使用 `pip install -r requirements.txt`
+4. **监控构建日志** 及时发现问题
+5. **逐步添加依赖** 一个一个测试，而不是一次性全加
+## 🔗 相关资源
+- [HF Spaces 文档](https://huggingface.co/docs/hub/spaces)
+- [Gradio Spaces 指南](https://huggingface.co/docs/hub/spaces-sdks-gradio)
+- [依赖管理](https://huggingface.co/docs/hub/spaces-dependencies)

PYTHON_VERSION_CONFIG.md ADDED Viewed

	@@ -0,0 +1,290 @@

+# Python 版本配置说明
+## 📋 Python 版本配置位置
+### ✅ 已为你配置的 3 个地方：
+---
+## 1️⃣ README.md (Hugging Face Spaces) ⭐ **最重要**
+```yaml
+---
+title: Depth Anything 3
+sdk: gradio
+sdk_version: 5.49.1
+app_file: app.py
+python_version: 3.11    # 🔑 关键配置
+---
+```
+**作用范围：** Hugging Face Spaces 部署
+**优先级：** 🔥 最高（Spaces 专用）
+**支持的版本：**
+- `3.8`
+- `3.9`
+- `3.10`
+- `3.11` ✅ (你选择的)
+- `3.12` (较新，可能有兼容性问题)
+**注意：**
+- 这是 HF Spaces 唯一识别的配置
+- 如果不指定，默认使用 `3.10`
+- 必须是精确版本号（如 `3.11`），不能用范围（如 `>=3.11`）
+---
+## 2️⃣ pyproject.toml (项目配置)
+```toml
+[project]
+requires-python = ">=3.11"  # ✅ 已配置
+```
+**作用范围：**
+- 本地开发
+- pip 安装时版本检查
+- 包管理器（poetry, hatch 等）
+**优先级：** 中等
+**支持格式：**
+```toml
+requires-python = ">=3.11"           # 最低 3.11
+requires-python = ">=3.11, <3.13"    # 3.11 到 3.12
+requires-python = "~=3.11"           # 3.11.x 系列
+```
+**效果：**
+```bash
+# 如果 Python 版本不符合要求，安装时会报错
+$ pip install .
+ERROR: Package requires a different Python: 3.9.0 not in '>=3.11'
+```
+---
+## 3️⃣ runtime.txt (备用方式)
+```txt
+python-3.11
+```
+**作用范围：**
+- Heroku
+- 某些 Docker 构建系统
+- HF Spaces (备用，如果 README.md 没有配置)
+**优先级：** 低
+**格式：**
+```txt
+python-3.11      # ✅ 精确版本
+python-3.11.5    # ✅ 更精确的版本
+```
+---
+## 🎯 配置优先级（Hugging Face Spaces）
+```
+README.md (python_version)
+    ↓ 最高优先级
+runtime.txt
+    ↓ 次要优先级
+默认版本 (3.10)
+    ↓ 兜底
+```
+**最佳实践：** 同时配置 `README.md` 和 `pyproject.toml`
+---
+## 🔍 如何验证配置生效？
+### 在 Hugging Face Spaces：
+部署后，查看构建日志：
+```bash
+# 日志中会显示
+Setting up Python 3.11...
+Python 3.11.5
+pip 23.2.1
+```
+### 在本地验证：
+```bash
+# 检查 Python 版本
+python --version
+# Python 3.11.5
+# 尝试安装（检查 requires-python）
+pip install -e .
+# 如果版本不符合，会报错
+```
+---
+## 🚨 常见问题
+### Q1: 为什么选择 Python 3.11？
+**优点：**
+- ✅ 性能提升（比 3.10 快 10-60%）
+- ✅ 更好的错误信息
+- ✅ 新的类型特性
+- ✅ Gradio 5+ 完全支持
+**注意：**
+- ⚠️ 某些老库可能不支持（如 gsplat）
+- ⚠️ 需要测试所有依赖是否兼容
+### Q2: 如果我想支持多个版本怎么办？
+**pyproject.toml 配置：**
+```toml
+requires-python = ">=3.11, <3.13"  # 支持 3.11 和 3.12
+```
+**但 HF Spaces 只能选一个：**
+```yaml
+python_version: 3.11  # 只能指定一个精确版本
+```
+### Q3: 如何测试不同 Python 版本？
+**使用 pyenv：**
+```bash
+# 安装多个 Python 版本
+pyenv install 3.11.5
+pyenv install 3.12.0
+# 切换版本测试
+pyenv local 3.11.5
+python --version
+pip install -e .
+python app.py
+```
+**使用 Docker：**
+```dockerfile
+FROM python:3.11
+WORKDIR /app
+COPY . .
+RUN pip install -r requirements.txt
+CMD ["python", "app.py"]
+```
+### Q4: 版本冲突怎么办？
+**场景：** 某个依赖不支持 Python 3.11
+**解决方法：**
+1. **找替代包**
+   ```txt
+   # requirements.txt
+   old-package  # 不支持 3.11
+   ↓
+   new-package  # 支持 3.11
+   ```
+2. **降级 Python 版本**
+   ```yaml
+   python_version: 3.10  # 改回 3.10
+   ```
+3. **等待上游更新**
+   ```bash
+   pip install git+https://github.com/xxx/package@main
+   ```
+---
+## 📊 Python 版本兼容性参考
+| Python 版本 | Gradio 5 | PyTorch 2.x | Spaces 支持 | 推荐 |
+|------------|----------|-------------|------------|------|
+| 3.8 | ✅ | ✅ | ✅ | ❌ (太旧) |
+| 3.9 | ✅ | ✅ | ✅ | ⚠️ |
+| 3.10 | ✅ | ✅ | ✅ | ✅ |
+| 3.11 | ✅ | ✅ | ✅ | ⭐ 推荐 |
+| 3.12 | ✅ | ⚠️ | ✅ | ⚠️ (较新) |
+| 3.13 | ⚠️ | ❌ | ⚠️ | ❌ (太新) |
+---
+## 🎓 完整配置示例
+### 你当前的配置（已完成）✅
+**README.md:**
+```yaml
+---
+python_version: 3.11
+---
+```
+**pyproject.toml:**
+```toml
+requires-python = ">=3.11"
+```
+**runtime.txt:**
+```txt
+python-3.11
+```
+### 如果要降级到 3.10：
+**README.md:**
+```yaml
+python_version: 3.10
+```
+**pyproject.toml:**
+```toml
+requires-python = ">=3.10"
+```
+**runtime.txt:**
+```txt
+python-3.10
+```
+---
+## 🔧 测试清单
+部署前检查：
+- [ ] ✅ README.md 有 `python_version: 3.11`
+- [ ] ✅ pyproject.toml 有 `requires-python = ">=3.11"`
+- [ ] ✅ 本地测试使用 Python 3.11
+- [ ] ✅ 所有依赖支持 Python 3.11
+- [ ] ✅ requirements.txt 包含所有依赖
+- [ ] ✅ app.py 可以正常启动
+---
+## 📚 参考资料
+- [HF Spaces Python 版���文档](https://huggingface.co/docs/hub/spaces-config-reference#python_version)
+- [Python 版本发布时间表](https://devguide.python.org/versions/)
+- [PyPI 包兼容性查询](https://pypi.org/)
+---
+## 💡 总结
+**对于 Hugging Face Spaces 部署：**
+1. **必须配置：** `README.md` 中的 `python_version: 3.11`
+2. **推荐配置：** `pyproject.toml` 中的 `requires-python = ">=3.11"`
+3. **可选配置：** `runtime.txt`（备用）
+**当前配置状态：** ✅ 全部已配置完成！

README.md CHANGED Viewed

@@ -6,6 +6,7 @@ colorTo: pink
 sdk: gradio
 sdk_version: 5.49.1
 app_file: app.py
 pinned: false
 license: cc-by-nc-4.0
 ---

 sdk: gradio
 sdk_version: 5.49.1
 app_file: app.py
+python_version: 3.11
 pinned: false
 license: cc-by-nc-4.0
 ---

SPACES_SETUP.md ADDED Viewed

	@@ -0,0 +1,190 @@

+# Hugging Face Spaces 部署指南
+## 📋 概述
+这个项目已经配置好可以部署到 Hugging Face Spaces，使用 `@spaces.GPU` 装饰器来动态分配 GPU 资源。
+## 🎯 关键文件
+### 1. `app.py` - 主应用文件
+```python
+import spaces
+from depth_anything_3.app.gradio_app import DepthAnything3App
+from depth_anything_3.app.modules.model_inference import ModelInference
+# 使用 monkey-patching 将 GPU 装饰器应用到推理函数
+original_run_inference = ModelInference.run_inference
+@spaces.GPU(duration=120)  # 请求 GPU，最多 120 秒
+def gpu_run_inference(self, *args, **kwargs):
+    return original_run_inference(self, *args, **kwargs)
+ModelInference.run_inference = gpu_run_inference
+```
+**工作原理：**
+- `@spaces.GPU` 装饰器在函数调用时动态分配 GPU
+- `duration=120` 表示单次推理最多使用 GPU 120 秒
+- 通过 monkey-patching，我们将装饰器应用到已有的推理函数上，无需修改核心代码
+### 2. `README.md` - Spaces 配置
+```yaml
+---
+title: Depth Anything 3
+sdk: gradio
+sdk_version: 5.49.1
+app_file: app.py
+pinned: false
+license: cc-by-nc-4.0
+---
+```
+这个 YAML 前置内容告诉 Hugging Face Spaces：
+- 使用 Gradio SDK
+- 入口文件是 `app.py`
+- 使用的 Gradio 版本
+### 3. `pyproject.toml` - 依赖配置
+已经更新，包含了 `spaces` 依赖：
+```toml
+[project.optional-dependencies]
+app = ["gradio>=5", "pillow>=9.0", "spaces"]
+```
+## 🚀 部署步骤
+### 方式 1：通过 Hugging Face 网页界面
+1. 在 Hugging Face 创建一个新的 Space
+2. 选择 **Gradio** 作为 SDK
+3. 上传你的代码（包括 `app.py`, `src/`, `pyproject.toml` 等）
+4. Space 会自动构建并启动
+### 方式 2：通过 Git
+```bash
+# 克隆你的 Space 仓库
+git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
+cd YOUR_SPACE_NAME
+# 添加你的代码
+cp -r /path/to/depth-anything-3/* .
+# 提交并推送
+git add .
+git commit -m "Initial commit"
+git push
+```
+## 🔧 配置选项
+### GPU 类型
+Hugging Face Spaces 支持不同的 GPU 类型：
+- **Free (T4)**: 免费，适合小型模型
+- **A10G**: 付费，更强大
+- **A100**: 付费，最强大
+### GPU Duration
+在 `app.py` 中可以调整：
+```python
+@spaces.GPU(duration=120)  # 120 秒
+```
+- 设置太短：复杂推理可能超时
+- 设置太长：浪费资源
+- 推荐：根据实际推理时间设置（可以先设长一点，然后根据日志调整）
+### 环境变量
+可以在 Space 设置中配置环境变量：
+- `DA3_MODEL_DIR`: 模型目录路径
+- `DA3_WORKSPACE_DIR`: 工作空间目录
+- `DA3_GALLERY_DIR`: 图库目录
+## 📊 监控和调试
+### 查看日志
+在 Spaces 界面点击 "Logs" 标签可以看到：
+```
+🚀 Launching Depth Anything 3 on Hugging Face Spaces...
+📦 Model Directory: depth-anything/DA3NESTED-GIANT-LARGE
+📁 Workspace Directory: workspace/gradio
+🖼️  Gallery Directory: workspace/gallery
+```
+### GPU 使用情况
+在装饰的函数内部，可以检查 GPU 状态：
+```python
+print(torch.cuda.is_available())  # True
+print(torch.cuda.device_count())  # 1 (通常)
+print(torch.cuda.get_device_name(0))  # 'Tesla T4' 或其他
+```
+## 🎓 示例代码
+查看 `example_spaces_gpu.py` 了解 `@spaces.GPU` 装饰器的基本用法。
+## ❓ 常见问题
+### Q: 为什么使用 monkey-patching？
+A: 这样可以在不修改核心代码的情况下添加 Spaces 支持。如果你想更优雅的方式，可以：
+1. 直接在 `ModelInference.run_inference` 方法上添加装饰器
+2. 创建一个继承自 `ModelInference` 的新类
+### Q: 如何测试本地是否能运行？
+A: 本地运行时，`spaces.GPU` 装饰器会被忽略（如果没有安装 spaces 包），或者会直接执行函数而不做特殊处理。
+```bash
+# 本地测试
+python app.py
+```
+### Q: 可以装饰多个函数吗？
+A: 可以！你可以给任何需要 GPU 的函数添加 `@spaces.GPU` 装饰器。
+```python
+@spaces.GPU(duration=60)
+def function1():
+    pass
+@spaces.GPU(duration=120)
+def function2():
+    pass
+```
+### Q: 如何优化 GPU 使用？
+A: 一些建议：
+1. **只装饰必要的函数**：不要装饰整个 app，只装饰实际使用 GPU 的推理函数
+2. **设置合适的 duration**：根据实际需求设置
+3. **清理 GPU 内存**：在函数结束时调用 `torch.cuda.empty_cache()`
+4. **批处理**：如果可能，批量处理多个请求
+## 🔗 相关资源
+- [Hugging Face Spaces 文档](https://huggingface.co/docs/hub/spaces)
+- [Spaces GPU 使用指南](https://huggingface.co/docs/hub/spaces-gpus)
+- [Gradio 文档](https://gradio.app/docs)
+## 📝 许可证
+Apache-2.0

app.py ADDED Viewed

	@@ -0,0 +1,73 @@

+# Copyright (c) 2025 ByteDance Ltd. and/or its affiliates
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+Hugging Face Spaces App for Depth Anything 3.
+This app uses the @spaces.GPU decorator to dynamically allocate GPU resources
+for model inference on Hugging Face Spaces.
+"""
+import os
+import spaces
+from depth_anything_3.app.gradio_app import DepthAnything3App
+from depth_anything_3.app.modules.model_inference import ModelInference
+# Monkey-patch the run_inference method to use @spaces.GPU decorator
+# This allows dynamic GPU allocation on Hugging Face Spaces
+original_run_inference = ModelInference.run_inference
+@spaces.GPU(duration=120)  # Request GPU for up to 120 seconds per inference
+def gpu_run_inference(self, *args, **kwargs):
+    """
+    GPU-accelerated inference with Spaces decorator.
+    This function wraps the original run_inference method with @spaces.GPU,
+    which ensures the model is moved to GPU when needed on HF Spaces.
+    """
+    return original_run_inference(self, *args, **kwargs)
+# Replace the original method with the GPU-decorated version
+ModelInference.run_inference = gpu_run_inference
+# Initialize and launch the app
+if __name__ == "__main__":
+    # Configure directories for Hugging Face Spaces
+    model_dir = os.environ.get("DA3_MODEL_DIR", "depth-anything/DA3NESTED-GIANT-LARGE")
+    workspace_dir = os.environ.get("DA3_WORKSPACE_DIR", "workspace/gradio")
+    gallery_dir = os.environ.get("DA3_GALLERY_DIR", "workspace/gallery")
+    # Create directories if they don't exist
+    os.makedirs(workspace_dir, exist_ok=True)
+    os.makedirs(gallery_dir, exist_ok=True)
+    # Initialize the app
+    app = DepthAnything3App(
+        model_dir=model_dir,
+        workspace_dir=workspace_dir,
+        gallery_dir=gallery_dir
+    )
+    # Launch with Spaces-friendly settings
+    print("🚀 Launching Depth Anything 3 on Hugging Face Spaces...")
+    print(f"📦 Model Directory: {model_dir}")
+    print(f"📁 Workspace Directory: {workspace_dir}")
+    print(f"🖼️  Gallery Directory: {gallery_dir}")
+    app.launch(
+        host="0.0.0.0",  # Required for Spaces
+        port=7860,       # Standard Gradio port
+        share=False,     # Not needed on Spaces
+        debug=False
+    )

example_spaces_gpu.py ADDED Viewed

	@@ -0,0 +1,52 @@

+"""
+Simple example demonstrating @spaces.GPU decorator usage.
+This example shows how the @spaces.GPU decorator works:
+- Variables created outside the decorated function stay on CPU initially
+- When the decorated function is called, the process moves to GPU environment
+- Inside the decorated function, tensors can access CUDA
+"""
+import gradio as gr
+import spaces
+import torch
+# This tensor is created at module load time
+# On HF Spaces, it will be on CPU until a @spaces.GPU function is called
+zero = torch.Tensor([0])
+# Try to move to cuda - will fail gracefully if no GPU available
+try:
+    zero = zero.cuda()
+    print(f"Initial device: {zero.device}")  # On Spaces: shows 'cpu' 🤔
+except:
+    print(f"Initial device: {zero.device}")  # cpu (no GPU available yet)
+@spaces.GPU(duration=60)  # Request GPU for up to 60 seconds
+def greet(n):
+    """
+    This function runs on GPU when called.
+    The @spaces.GPU decorator ensures GPU access.
+    """
+    # Inside the decorated function, we have GPU access
+    print(f"Inside GPU function - device: {zero.device}")  # On Spaces: shows 'cuda:0' 🤗
+    # Perform GPU computation
+    result = zero + n
+    return f"Hello {result.item()} Tensor! (computed on {zero.device})"
+# Create Gradio interface
+demo = gr.Interface(
+    fn=greet,
+    inputs=gr.Number(value=42, label="Enter a number"),
+    outputs=gr.Text(label="Result"),
+    title="Spaces GPU Example",
+    description="Demonstrates @spaces.GPU decorator usage"
+)
+if __name__ == "__main__":
+    demo.launch()

packages.txt ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ build-essential
2	+ git
3	+

pyproject.toml ADDED Viewed

	@@ -0,0 +1,93 @@

+[build-system]
+requires = ["hatchling>=1.25", "hatch-vcs>=0.4"]
+build-backend = "hatchling.build"
+[project]
+name = "depth-anything-3"
+version = "0.0.0"
+description = "Depth Anything 3"
+readme = "README.md"
+requires-python = ">=3.11"
+license = { text = "Apache-2.0" }
+authors = [{ name = "Your Name" }]
+dependencies = [
+    "pre-commit",
+    "trimesh",
+    "torch>=2",
+    "torchvision",
+    "einops",
+    "huggingface_hub",
+    "imageio",
+    "numpy<2",
+    "opencv-python",
+    "xformers",
+    "open3d",
+    "fastapi",
+    "unicorn",
+    "requests",
+    "typer",
+    "pillow",
+    "omegaconf",
+    "evo",
+    "e3nn",
+    "moviepy",
+    "plyfile",
+    "pillow_heif",
+    "safetensors",
+    "uvicorn",
+    "moviepy==1.0.3",
+    "typer>=0.9.0",
+]
+[project.optional-dependencies]
+app = ["gradio>=5", "pillow>=9.0", "spaces"] # requires that python3>=3.10
+gs = ["gsplat @ git+https://github.com/nerfstudio-project/gsplat.git@0b4dddf04cb687367602c01196913cde6a743d70"]
+all = ["depth-anything-3[app,gs]"]
+[project.scripts]
+da3 = "depth_anything_3.cli:app"
+[project.urls]
+Homepage = "https://github.com/ByteDance-Seed/Depth-Anything-3"
+[tool.hatch.version]
+source = "vcs"
+[tool.hatch.build.targets.wheel]
+packages = ["src/depth_anything_3"]
+[tool.hatch.build.targets.sdist]
+include = [
+  "/README.md",
+  "/pyproject.toml",
+  "/src/depth_anything_3",
+]
+[tool.hatch.metadata]
+allow-direct-references = true
+[tool.mypy]
+plugins = ["jaxtyping.mypy_plugin"]
+[tool.black]
+line-length = 99
+target-version = ['py37', 'py38', 'py39', 'py310', 'py311']
+include = '\.pyi?$'
+exclude = '''
+/(
+  | \.git
+)/
+'''
+[tool.isort]
+profile = "black"
+multi_line_output = 3
+include_trailing_comma = true
+known_third_party = ["bson","cruise","cv2","dataloader","diffusers","omegaconf","tensorflow","torch","torchvision","transformers","gsplat"]
+known_first_party = ["common", "data", "models", "projects"]
+sections = ["FUTURE","STDLIB","THIRDPARTY","FIRSTPARTY","LOCALFOLDER"]
+skip_gitignore = true
+line_length = 99
+no_lines_before="THIRDPARTY"

requirements-basic.txt ADDED Viewed

	@@ -0,0 +1,41 @@

+# Basic requirements without gsplat (for faster build)
+# Use this if gsplat build fails on HF Spaces
+# To use: rename this to requirements.txt
+# Core dependencies
+torch>=2.0.0
+torchvision
+einops
+huggingface_hub
+numpy<2
+opencv-python
+# Gradio and Spaces
+gradio>=5.0.0
+spaces
+pillow>=9.0
+# 3D and visualization
+trimesh
+open3d
+plyfile
+# Image processing
+imageio
+pillow_heif
+safetensors
+# Video processing
+moviepy==1.0.3
+# Math and geometry
+e3nn
+# Utilities
+requests
+omegaconf
+xformers
+# NOTE: gsplat is NOT included in this version
+# 3DGS features will be disabled

requirements.txt ADDED Viewed

	@@ -0,0 +1,38 @@

+# Core dependencies
+torch>=2.0.0
+torchvision
+einops
+huggingface_hub
+numpy<2
+opencv-python
+# Gradio and Spaces
+gradio>=5.0.0
+spaces
+pillow>=9.0
+# 3D and visualization
+trimesh
+open3d
+plyfile
+# Image processing
+imageio
+pillow_heif
+safetensors
+# Video processing
+moviepy==1.0.3
+# Math and geometry
+e3nn
+# Utilities
+requests
+omegaconf
+xformers
+# 3D Gaussian Splatting
+# Note: This requires CUDA during build. If build fails on Spaces, see alternative solutions.
+gsplat @ git+https://github.com/nerfstudio-project/gsplat.git@0b4dddf04cb687367602c01196913cde6a743d70

runtime.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ python-3.11
2	+

src/depth_anything_3/__pycache__/api.cpython-311.pyc ADDED Viewed

Binary file (17.9 kB). View file

src/depth_anything_3/__pycache__/cfg.cpython-311.pyc ADDED Viewed

Binary file (6.98 kB). View file

src/depth_anything_3/__pycache__/cli.cpython-311.pyc ADDED Viewed

Binary file (27.2 kB). View file

src/depth_anything_3/__pycache__/registry.cpython-311.pyc ADDED Viewed

Binary file (1.71 kB). View file

src/depth_anything_3/__pycache__/specs.cpython-311.pyc ADDED Viewed

Binary file (1.73 kB). View file

src/depth_anything_3/api.py ADDED Viewed

	@@ -0,0 +1,414 @@

+# Copyright (c) 2025 ByteDance Ltd. and/or its affiliates
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+Depth Anything 3 API module.
+This module provides the main API for Depth Anything 3, including model loading,
+inference, and export capabilities. It supports both single and nested model architectures.
+"""
+from __future__ import annotations
+import time
+from typing import Optional, Sequence
+import numpy as np
+import torch
+import torch.nn as nn
+from huggingface_hub import PyTorchModelHubMixin
+from PIL import Image
+from depth_anything_3.cfg import create_object, load_config
+from depth_anything_3.registry import MODEL_REGISTRY
+from depth_anything_3.specs import Prediction
+from depth_anything_3.utils.export import export
+from depth_anything_3.utils.geometry import affine_inverse
+from depth_anything_3.utils.io.input_processor import InputProcessor
+from depth_anything_3.utils.io.output_processor import OutputProcessor
+from depth_anything_3.utils.logger import logger
+from depth_anything_3.utils.pose_align import align_poses_umeyama
+torch.backends.cudnn.benchmark = False
+# logger.info("CUDNN Benchmark Disabled")
+SAFETENSORS_NAME = "model.safetensors"
+CONFIG_NAME = "config.json"
+class DepthAnything3(nn.Module, PyTorchModelHubMixin):
+    """
+    Depth Anything 3 main API class.
+    This class provides a high-level interface for depth estimation using Depth Anything 3.
+    It supports both single and nested model architectures with metric scaling capabilities.
+    Features:
+    - Hugging Face Hub integration via PyTorchModelHubMixin
+    - Support for multiple model presets (vitb, vitg, nested variants)
+    - Automatic mixed precision inference
+    - Export capabilities for various formats (GLB, PLY, NPZ, etc.)
+    - Camera pose estimation and metric depth scaling
+    Usage:
+        # Load from Hugging Face Hub
+        model = DepthAnything3.from_pretrained("huggingface/model-name")
+        # Or create with specific preset
+        model = DepthAnything3(preset="vitg")
+        # Run inference
+        prediction = model.inference(images, export_dir="output", export_format="glb")
+    """
+    _commit_hash: str | None = None  # Set by mixin when loading from Hub
+    def __init__(self, model_name: str = "da3-large", **kwargs):
+        """
+        Initialize DepthAnything3 with specified preset.
+        Args:
+        model_name: The name of the model preset to use.
+                    Examples: 'da3-giant', 'da3-large', 'da3metric-large', 'da3nested-giant-large'.
+        **kwargs: Additional keyword arguments (currently unused).
+        """
+        super().__init__()
+        self.model_name = model_name
+        # Build the underlying network
+        self.config = load_config(MODEL_REGISTRY[self.model_name])
+        self.model = create_object(self.config)
+        self.model.eval()
+        # Initialize processors
+        self.input_processor = InputProcessor()
+        self.output_processor = OutputProcessor()
+        # Device management (set by user)
+        self.device = None
+    @torch.inference_mode()
+    def forward(
+        self,
+        image: torch.Tensor,
+        extrinsics: torch.Tensor | None = None,
+        intrinsics: torch.Tensor | None = None,
+        export_feat_layers: list[int] | None = None,
+        infer_gs: bool = False,
+    ) -> dict[str, torch.Tensor]:
+        """
+        Forward pass through the model.
+        Args:
+            image: Input batch with shape ``(B, N, 3, H, W)`` on the model device.
+            extrinsics: Optional camera extrinsics with shape ``(B, N, 4, 4)``.
+            intrinsics: Optional camera intrinsics with shape ``(B, N, 3, 3)``.
+            export_feat_layers: Layer indices to return intermediate features for.
+        Returns:
+            Dictionary containing model predictions
+        """
+        # Determine optimal autocast dtype
+        autocast_dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16
+        with torch.no_grad():
+            with torch.autocast(device_type=image.device.type, dtype=autocast_dtype):
+                return self.model(image, extrinsics, intrinsics, export_feat_layers, infer_gs)
+    def inference(
+        self,
+        image: list[np.ndarray | Image.Image | str],
+        extrinsics: np.ndarray | None = None,
+        intrinsics: np.ndarray | None = None,
+        align_to_input_ext_scale: bool = True,
+        infer_gs: bool = False,
+        render_exts: np.ndarray | None = None,
+        render_ixts: np.ndarray | None = None,
+        render_hw: tuple[int, int] | None = None,
+        process_res: int = 504,
+        process_res_method: str = "upper_bound_resize",
+        export_dir: str | None = None,
+        export_format: str = "mini_npz",
+        export_feat_layers: Sequence[int] | None = None,
+        # GLB export parameters
+        conf_thresh_percentile: float = 40.0,
+        num_max_points: int = 1_000_000,
+        show_cameras: bool = True,
+        # Feat_vis export parameters
+        feat_vis_fps: int = 15,
+        export_kwargs: Optional[dict] = {},
+    ) -> Prediction:
+        """
+        Run inference on input images.
+        Args:
+            image: List of input images (numpy arrays, PIL Images, or file paths)
+            extrinsics: Camera extrinsics (N, 4, 4)
+            intrinsics: Camera intrinsics (N, 3, 3)
+            align_to_input_ext_scale: whether to align the input pose scale to the prediction
+            infer_gs: Enable the 3D Gaussian branch (needed for `gs_ply`/`gs_video` exports)
+            render_exts: Optional render extrinsics for Gaussian video export
+            render_ixts: Optional render intrinsics for Gaussian video export
+            render_hw: Optional render resolution for Gaussian video export
+            process_res: Processing resolution
+            process_res_method: Resize method for processing
+            export_dir: Directory to export results
+            export_format: Export format (mini_npz, npz, glb, ply, gs, gs_video)
+            export_feat_layers: Layer indices to export intermediate features from
+            conf_thresh_percentile: [GLB] Lower percentile for adaptive confidence threshold (default: 40.0) # noqa: E501
+            num_max_points: [GLB] Maximum number of points in the point cloud (default: 1,000,000)
+            show_cameras: [GLB] Show camera wireframes in the exported scene (default: True)
+            feat_vis_fps: [FEAT_VIS] Frame rate for output video (default: 15)
+            export_kwargs: additional arguments to export functions.
+        Returns:
+            Prediction object containing depth maps and camera parameters
+        """
+        if "gs" in export_format:
+            assert infer_gs, "must set `infer_gs=True` to perform gs-related export."
+        # Preprocess images
+        imgs_cpu, extrinsics, intrinsics = self._preprocess_inputs(
+            image, extrinsics, intrinsics, process_res, process_res_method
+        )
+        # Prepare tensors for model
+        imgs, ex_t, in_t = self._prepare_model_inputs(imgs_cpu, extrinsics, intrinsics)
+        # Normalize extrinsics
+        ex_t_norm = self._normalize_extrinsics(ex_t.clone() if ex_t is not None else None)
+        # Run model forward pass
+        export_feat_layers = list(export_feat_layers) if export_feat_layers is not None else []
+        raw_output = self._run_model_forward(imgs, ex_t_norm, in_t, export_feat_layers, infer_gs)
+        # Convert raw output to prediction
+        prediction = self._convert_to_prediction(raw_output)
+        # Align prediction to extrinsincs
+        prediction = self._align_to_input_extrinsics_intrinsics(
+            extrinsics, intrinsics, prediction, align_to_input_ext_scale
+        )
+        # Add processed images for visualization
+        prediction = self._add_processed_images(prediction, imgs_cpu)
+        # Export if requested
+        if export_dir is not None:
+            if "gs" in export_format:
+                if infer_gs and "gs_video" not in export_format:
+                    export_format = f"{export_format}-gs_video"
+                if "gs_video" in export_format:
+                    if "gs_video" not in export_kwargs:
+                        export_kwargs["gs_video"] = {}
+                    export_kwargs["gs_video"].update(
+                        {
+                            "extrinsics": render_exts,
+                            "intrinsics": render_ixts,
+                            "out_image_hw": render_hw,
+                        }
+                    )
+            # Add GLB export parameters
+            if "glb" in export_format:
+                if "glb" not in export_kwargs:
+                    export_kwargs["glb"] = {}
+                export_kwargs["glb"].update(
+                    {
+                        "conf_thresh_percentile": conf_thresh_percentile,
+                        "num_max_points": num_max_points,
+                        "show_cameras": show_cameras,
+                    }
+                )
+            # Add Feat_vis export parameters
+            if "feat_vis" in export_format:
+                if "feat_vis" not in export_kwargs:
+                    export_kwargs["feat_vis"] = {}
+                export_kwargs["feat_vis"].update(
+                    {
+                        "fps": feat_vis_fps,
+                    }
+                )
+            self._export_results(prediction, export_format, export_dir, **export_kwargs)
+        return prediction
+    def _preprocess_inputs(
+        self,
+        image: list[np.ndarray | Image.Image | str],
+        extrinsics: np.ndarray | None = None,
+        intrinsics: np.ndarray | None = None,
+        process_res: int = 504,
+        process_res_method: str = "upper_bound_resize",
+    ) -> torch.Tensor:
+        """Preprocess input images using input processor."""
+        start_time = time.time()
+        imgs_cpu, extrinsics, intrinsics = self.input_processor(
+            image,
+            extrinsics.copy() if extrinsics is not None else None,
+            intrinsics.copy() if intrinsics is not None else None,
+            process_res,
+            process_res_method,
+        )
+        end_time = time.time()
+        logger.info(
+            "Processed Images Done taking",
+            end_time - start_time,
+            "seconds. Shape: ",
+            imgs_cpu.shape,
+        )
+        return imgs_cpu, extrinsics, intrinsics
+    def _prepare_model_inputs(
+        self,
+        imgs_cpu: torch.Tensor,
+        extrinsics: torch.tensor | None,
+        intrinsics: torch.tensor | None,
+    ) -> tuple[torch.Tensor, torch.Tensor | None, torch.Tensor | None]:
+        """Prepare tensors for model input."""
+        device = self._get_model_device()
+        # Move images to model device
+        imgs = imgs_cpu.to(device, non_blocking=True)[None].float()
+        # Convert camera parameters to tensors
+        ex_t = (
+            extrinsics.to(device, non_blocking=True)[None].float()
+            if extrinsics is not None
+            else None
+        )
+        in_t = (
+            intrinsics.to(device, non_blocking=True)[None].float()
+            if intrinsics is not None
+            else None
+        )
+        return imgs, ex_t, in_t
+    def _normalize_extrinsics(self, ex_t: torch.Tensor) -> torch.Tensor:
+        """Normalize extrinsics"""
+        if ex_t is None:
+            return None
+        transform = affine_inverse(ex_t[:, :1])
+        ex_t_norm = ex_t @ transform
+        c2ws = affine_inverse(ex_t_norm)
+        translations = c2ws[..., :3, 3]
+        dists = translations.norm(dim=-1)
+        median_dist = torch.median(dists)
+        median_dist = torch.clamp(median_dist, min=1e-1)
+        ex_t_norm[..., :3, 3] = ex_t_norm[..., :3, 3] / median_dist
+        return ex_t_norm
+    def _align_to_input_extrinsics_intrinsics(
+        self,
+        extrinsics: torch.Tensor,
+        intrinsics: torch.Tensor,
+        prediction: Prediction,
+        align_to_input_ext_scale: bool = True,
+        ransac_view_thresh: int = 10,
+    ) -> Prediction:
+        """Align depth map to input extrinsics"""
+        if extrinsics is None:
+            return prediction
+        prediction.intrinsics = intrinsics.numpy()
+        _, _, scale, aligned_extrinsics = align_poses_umeyama(
+            prediction.extrinsics,
+            extrinsics.numpy(),
+            ransac=len(extrinsics) >= ransac_view_thresh,
+            return_aligned=True,
+            random_state=42,
+        )
+        if align_to_input_ext_scale:
+            prediction.extrinsics = extrinsics[..., :3, :].numpy()
+            prediction.depth /= scale
+        else:
+            prediction.extrinsics = aligned_extrinsics
+        return prediction
+    def _run_model_forward(
+        self,
+        imgs: torch.Tensor,
+        ex_t: torch.Tensor | None,
+        in_t: torch.Tensor | None,
+        export_feat_layers: Sequence[int] | None = None,
+        infer_gs: bool = False,
+    ) -> dict[str, torch.Tensor]:
+        """Run model forward pass."""
+        device = imgs.device
+        need_sync = device.type == "cuda"
+        if need_sync:
+            torch.cuda.synchronize(device)
+        start_time = time.time()
+        feat_layers = list(export_feat_layers) if export_feat_layers is not None else None
+        output = self.forward(imgs, ex_t, in_t, feat_layers, infer_gs)
+        if need_sync:
+            torch.cuda.synchronize(device)
+        end_time = time.time()
+        logger.info(f"Model Forward Pass Done. Time: {end_time - start_time} seconds")
+        return output
+    def _convert_to_prediction(self, raw_output: dict[str, torch.Tensor]) -> Prediction:
+        """Convert raw model output to Prediction object."""
+        start_time = time.time()
+        output = self.output_processor(raw_output)
+        end_time = time.time()
+        logger.info(f"Conversion to Prediction Done. Time: {end_time - start_time} seconds")
+        return output
+    def _add_processed_images(self, prediction: Prediction, imgs_cpu: torch.Tensor) -> Prediction:
+        """Add processed images to prediction for visualization."""
+        # Convert from (N, 3, H, W) to (N, H, W, 3) and denormalize
+        processed_imgs = imgs_cpu.permute(0, 2, 3, 1).cpu().numpy()  # (N, H, W, 3)
+        # Denormalize from ImageNet normalization
+        mean = np.array([0.485, 0.456, 0.406])
+        std = np.array([0.229, 0.224, 0.225])
+        processed_imgs = processed_imgs * std + mean
+        processed_imgs = np.clip(processed_imgs, 0, 1)
+        processed_imgs = (processed_imgs * 255).astype(np.uint8)
+        prediction.processed_images = processed_imgs
+        return prediction
+    def _export_results(
+        self, prediction: Prediction, export_format: str, export_dir: str, **kwargs
+    ) -> None:
+        """Export results to specified format and directory."""
+        start_time = time.time()
+        export(prediction, export_format, export_dir, **kwargs)
+        end_time = time.time()
+        logger.info(f"Export Results Done. Time: {end_time - start_time} seconds")
+    def _get_model_device(self) -> torch.device:
+        """
+        Get the device where the model is located.
+        Returns:
+            Device where the model parameters are located
+        Raises:
+            ValueError: If no tensors are found in the model
+        """
+        if self.device is not None:
+            return self.device
+        # Find device from parameters
+        for param in self.parameters():
+            self.device = param.device
+            return param.device
+        # Find device from buffers
+        for buffer in self.buffers():
+            self.device = buffer.device
+            return buffer.device
+        raise ValueError("No tensor found in model")

src/depth_anything_3/app/__pycache__/css_and_html.cpython-311.pyc ADDED Viewed

Binary file (18.5 kB). View file

src/depth_anything_3/app/__pycache__/gradio_app.cpython-311.pyc ADDED Viewed

Binary file (27.9 kB). View file

src/depth_anything_3/app/css_and_html.py ADDED Viewed

	@@ -0,0 +1,594 @@

+# flake8: noqa: E501
+# Copyright (c) 2025 ByteDance Ltd. and/or its affiliates
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+CSS and HTML content for the Depth Anything 3 Gradio application.
+This module contains all the CSS styles and HTML content blocks
+used in the Gradio interface.
+"""
+# CSS Styles for the Gradio interface
+GRADIO_CSS = """
+/* Add Font Awesome CDN with all styles including brands and colors */
+@import url('https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.0/css/all.min.css');
+/* Add custom styles for colored icons */
+.fa-color-blue {
+    color: #3b82f6;
+}
+.fa-color-purple {
+    color: #8b5cf6;
+}
+.fa-color-cyan {
+    color: #06b6d4;
+}
+.fa-color-green {
+    color: #10b981;
+}
+.fa-color-yellow {
+    color: #f59e0b;
+}
+.fa-color-red {
+    color: #ef4444;
+}
+.link-btn {
+    display: inline-flex;
+    align-items: center;
+    gap: 8px;
+    text-decoration: none;
+    padding: 12px 24px;
+    border-radius: 50px;
+    font-weight: 500;
+    transition: all 0.3s ease;
+}
+/* Dark mode tech theme */
+@media (prefers-color-scheme: dark) {
+    html, body {
+        background: #1e293b;
+        color: #ffffff;
+    }
+    .gradio-container {
+        background: #1e293b;
+        color: #ffffff;
+    }
+    .link-btn {
+        background: rgba(255, 255, 255, 0.2);
+        color: white;
+        backdrop-filter: blur(10px);
+        border: 1px solid rgba(255, 255, 255, 0.3);
+    }
+    .link-btn:hover {
+        background: rgba(255, 255, 255, 0.3);
+        transform: translateY(-2px);
+        box-shadow: 0 8px 25px rgba(0, 0, 0, 0.2);
+    }
+    .tech-bg {
+        background: linear-gradient(135deg, #0f172a, #1e293b); /* Darker colors */
+        position: relative;
+        overflow: hidden;
+    }
+    .tech-bg::before {
+        content: '';
+        position: absolute;
+        top: 0;
+        left: 0;
+        right: 0;
+        bottom: 0;
+        background:
+            radial-gradient(circle at 20% 80%, rgba(59, 130, 246, 0.15) 0%, transparent 50%), /* Reduced opacity */
+            radial-gradient(circle at 80% 20%, rgba(139, 92, 246, 0.15) 0%, transparent 50%), /* Reduced opacity */
+            radial-gradient(circle at 40% 40%, rgba(18, 194, 233, 0.1) 0%, transparent 50%); /* Reduced opacity */
+        animation: techPulse 8s ease-in-out infinite;
+    }
+    .gradio-container .panel,
+    .gradio-container .block,
+    .gradio-container .form {
+        background: rgba(0, 0, 0, 0.3);
+        border: 1px solid rgba(59, 130, 246, 0.2);
+        border-radius: 10px;
+    }
+    .gradio-container * {
+        color: #ffffff;
+    }
+    .gradio-container label {
+        color: #e0e0e0;
+    }
+    .gradio-container .markdown {
+        color: #e0e0e0;
+    }
+}
+/* Light mode tech theme */
+@media (prefers-color-scheme: light) {
+    html, body {
+        background: #ffffff;
+        color: #1e293b;
+    }
+    .gradio-container {
+        background: #ffffff;
+        color: #1e293b;
+    }
+    .tech-bg {
+        background: linear-gradient(135deg, #ffffff, #f1f5f9);
+        position: relative;
+        overflow: hidden;
+    }
+    .link-btn {
+        background: rgba(59, 130, 246, 0.15);
+        color: var(--body-text-color);
+        border: 1px solid rgba(59, 130, 246, 0.3);
+    }
+    .link-btn:hover {
+        background: rgba(59, 130, 246, 0.25);
+        transform: translateY(-2px);
+        box-shadow: 0 8px 25px rgba(59, 130, 246, 0.2);
+    }
+    .tech-bg::before {
+        content: '';
+        position: absolute;
+        top: 0;
+        left: 0;
+        right: 0;
+        bottom: 0;
+        background:
+            radial-gradient(circle at 20% 80%, rgba(59, 130, 246, 0.1) 0%, transparent 50%),
+            radial-gradient(circle at 80% 20%, rgba(139, 92, 246, 0.1) 0%, transparent 50%),
+            radial-gradient(circle at 40% 40%, rgba(18, 194, 233, 0.08) 0%, transparent 50%);
+        animation: techPulse 8s ease-in-out infinite;
+    }
+    .gradio-container .panel,
+    .gradio-container .block,
+    .gradio-container .form {
+        background: rgba(255, 255, 255, 0.8);
+        border: 1px solid rgba(59, 130, 246, 0.3);
+        border-radius: 10px;
+        box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
+    }
+    .gradio-container * {
+        color: #1e293b;
+    }
+    .gradio-container label {
+        color: #334155;
+    }
+    .gradio-container .markdown {
+        color: #334155;
+    }
+}
+@keyframes techPulse {
+    0%, 100% { opacity: 0.5; }
+    50% { opacity: 0.8; }
+}
+/* Custom log with tech gradient */
+.custom-log * {
+    font-style: italic;
+    font-size: 22px !important;
+    background: linear-gradient(135deg, #3b82f6, #8b5cf6);
+    background-size: 400% 400%;
+    -webkit-background-clip: text;
+    background-clip: text;
+    font-weight: bold !important;
+    color: transparent !important;
+    text-align: center !important;
+    animation: techGradient 3s ease infinite;
+}
+@keyframes techGradient {
+    0% { background-position: 0% 50%; }
+    50% { background-position: 100% 50%; }
+    100% { background-position: 0% 50%; }
+}
+@keyframes metricPulse {
+    0%, 100% { background-position: 0% 50%; }
+    50% { background-position: 100% 50%; }
+}
+@keyframes pointcloudPulse {
+    0%, 100% { background-position: 0% 50%; }
+    50% { background-position: 100% 50%; }
+}
+@keyframes camerasPulse {
+    0%, 100% { background-position: 0% 50%; }
+    50% { background-position: 100% 50%; }
+}
+@keyframes gaussiansPulse {
+    0%, 100% { background-position: 0% 50%; }
+    50% { background-position: 100% 50%; }
+}
+/* Special colors for key terms - Global styles */
+.metric-text {
+    background: linear-gradient(45deg, #ff6b6b, #ff8e53, #ff6b6b);
+    background-size: 200% 200%;
+    -webkit-background-clip: text;
+    background-clip: text;
+    color: transparent !important;
+    animation: metricPulse 2s ease-in-out infinite;
+    font-weight: 700;
+    text-shadow: 0 0 10px rgba(255, 107, 107, 0.5);
+}
+.pointcloud-text {
+    background: linear-gradient(45deg, #4ecdc4, #44a08d, #4ecdc4);
+    background-size: 200% 200%;
+    -webkit-background-clip: text;
+    background-clip: text;
+    color: transparent !important;
+    animation: pointcloudPulse 2.5s ease-in-out infinite;
+    font-weight: 700;
+    text-shadow: 0 0 10px rgba(78, 205, 196, 0.5);
+}
+.cameras-text {
+    background: linear-gradient(45deg, #667eea, #764ba2, #667eea);
+    background-size: 200% 200%;
+    -webkit-background-clip: text;
+    background-clip: text;
+    color: transparent !important;
+    animation: camerasPulse 3s ease-in-out infinite;
+    font-weight: 700;
+    text-shadow: 0 0 10px rgba(102, 126, 234, 0.5);
+}
+.gaussians-text {
+    background: linear-gradient(45deg, #f093fb, #f5576c, #f093fb);
+    background-size: 200% 200%;
+    -webkit-background-clip: text;
+    background-clip: text;
+    color: transparent !important;
+    animation: gaussiansPulse 2.2s ease-in-out infinite;
+    font-weight: 700;
+    text-shadow: 0 0 10px rgba(240, 147, 251, 0.5);
+}
+.example-log * {
+    font-style: italic;
+    font-size: 16px !important;
+    background: linear-gradient(135deg, #3b82f6, #8b5cf6);
+    -webkit-background-clip: text;
+    background-clip: text;
+    color: transparent !important;
+}
+#my_radio .wrap {
+    display: flex;
+    flex-wrap: nowrap;
+    justify-content: center;
+    align-items: center;
+}
+#my_radio .wrap label {
+    display: flex;
+    width: 50%;
+    justify-content: center;
+    align-items: center;
+    margin: 0;
+    padding: 10px 0;
+    box-sizing: border-box;
+}
+/* Align navigation buttons with dropdown bottom */
+.navigation-row {
+    display: flex !important;
+    align-items: flex-end !important;
+    gap: 8px !important;
+}
+.navigation-row > div:nth-child(1),
+.navigation-row > div:nth-child(3) {
+    align-self: flex-end !important;
+}
+.navigation-row > div:nth-child(2) {
+    flex: 1 !important;
+}
+/* Make thumbnails clickable with pointer cursor */
+.clickable-thumbnail img {
+    cursor: pointer !important;
+}
+.clickable-thumbnail:hover img {
+    cursor: pointer !important;
+    opacity: 0.8;
+    transition: opacity 0.3s ease;
+}
+/* Make thumbnail containers narrower horizontally */
+.clickable-thumbnail {
+    padding: 5px 2px !important;
+    margin: 0 2px !important;
+}
+.clickable-thumbnail .image-container {
+    margin: 0 !important;
+    padding: 0 !important;
+}
+.scene-info {
+    text-align: center !important;
+    padding: 5px 2px !important;
+    margin: 0 !important;
+}
+"""
+def get_header_html(logo_base64=None):
+    """
+    Generate the main header HTML with logo and title.
+    Args:
+        logo_base64 (str, optional): Base64 encoded logo image
+    Returns:
+        str: HTML string for the header
+    """
+    return """
+    <div class="tech-bg" style="text-align: center; margin-bottom: 5px; padding: 40px 20px; border-radius: 15px; position: relative; overflow: hidden;">
+        <div style="position: relative; z-index: 2;">
+            <h1 style="margin: 0; font-size: 3.5em; font-weight: 700;
+                background: linear-gradient(135deg, #3b82f6, #8b5cf6);
+                background-size: 400% 400%;
+                -webkit-background-clip: text;
+                background-clip: text;
+                color: transparent;
+                animation: techGradient 3s ease infinite;
+                text-shadow: 0 0 30px rgba(59, 130, 246, 0.5);
+                letter-spacing: 2px;">
+                Depth Anything 3
+            </h1>
+            <p style="margin: 15px 0 0 0; font-size: 2.16em; font-weight: 300;" class="header-subtitle">
+                Recovering the Visual Space from Any Views
+            </p>
+            <div style="margin-top: 20px;">
+                <!-- Revert buttons to original inline styles -->
+                <a href="https://depth-anything-3.github.io" target="_blank" class="link-btn">
+                    <i class="fas fa-globe" style="margin-right: 8px;"></i> Project Page
+                </a>
+                <a href="https://arxiv.org/abs/2406.09414" target="_blank" class="link-btn">
+                    <i class="fas fa-file-pdf" style="margin-right: 8px;"></i> Paper
+                </a>
+                <a href="https://github.com/ByteDance-Seed/Depth-Anything-3" target="_blank" class="link-btn">
+                    <i class="fab fa-github" style="margin-right: 8px;"></i> Code
+                </a>
+            </div>
+        </div>
+    </div>
+    <style>
+        /* Ensure tech-bg class is properly applied in dark mode */
+        @media (prefers-color-scheme: dark) {
+            .header-subtitle {
+                color: #cbd5e1;
+            }
+            /* Increase priority to ensure background color is properly applied */
+            .tech-bg {
+                background: linear-gradient(135deg, #0f172a, #1e293b) !important;
+            }
+        }
+        @media (prefers-color-scheme: light) {
+            .header-subtitle {
+                color: #475569;
+            }
+            /* Also add explicit background color for light mode */
+            .tech-bg {
+                background: linear-gradient(135deg, rgba(59, 130, 246, 0.1) 0%, rgba(139, 92, 246, 0.1) 100%) !important;
+            }
+        }
+    </style>
+    """
+def get_description_html():
+    """
+    Generate the main description and getting started HTML.
+    Returns:
+        str: HTML string for the description
+    """
+    return """
+    <div class="description-container" style="padding: 25px; border-radius: 15px; margin: 0 0 20px 0;">
+        <h2 class="description-title" style="margin-top: 0; font-size: 1.6em; text-align: center;">
+            <i class="fas fa-bullseye fa-color-red" style="margin-right: 8px;"></i> What This Demo Does
+        </h2>
+        <div class="description-content" style="padding: 20px; border-radius: 10px; margin: 15px 0; text-align: center;">
+            <p class="description-main" style="line-height: 1.6; margin: 0; font-size: 1.45em;">
+                <strong>Upload images or videos</strong> → <strong>Get <span class="metric-text">Metric</span> <span class="pointcloud-text">Point Clouds</span>, <span class="cameras-text">Cameras</span> and <span class="gaussians-text">Novel Views</span></strong> → <strong>Explore in 3D</strong>
+            </p>
+        </div>
+        <div style="text-align: center; margin-top: 15px;">
+            <p class="description-tip" style="font-style: italic; margin: 0;">
+                <i class="fas fa-lightbulb fa-color-yellow" style="margin-right: 8px;"></i> <strong>Tip:</strong> Landscape-oriented images or videos are preferred for best 3D recovering.
+            </p>
+        </div>
+    </div>
+    <style>
+        @media (prefers-color-scheme: dark) {
+            .description-container {
+                background: linear-gradient(135deg, rgba(59, 130, 246, 0.1) 0%, rgba(139, 92, 246, 0.1) 100%);
+                border: 1px solid rgba(59, 130, 246, 0.2);
+            }
+            .description-title { color: #3b82f6; }
+            .description-content { background: rgba(0, 0, 0, 0.3); }
+            .description-main { color: #e0e0e0; }
+            .description-text { color: #cbd5e1; }
+            .description-tip { color: #cbd5e1; }
+        }
+        @media (prefers-color-scheme: light) {
+            .description-container {
+                background: linear-gradient(135deg, rgba(59, 130, 246, 0.05) 0%, rgba(139, 92, 246, 0.05) 100%);
+                border: 1px solid rgba(59, 130, 246, 0.3);
+            }
+            .description-title { color: #3b82f6; }
+            .description-content { background: transparent; }
+            .description-main { color: #1e293b; }
+            .description-text { color: #475569; }
+            .description-tip { color: #475569; }
+        }
+    </style>
+    """
+def get_acknowledgements_html():
+    """
+    Generate the acknowledgements section HTML.
+    Returns:
+        str: HTML string for the acknowledgements
+    """
+    return """
+    <div style="background: linear-gradient(135deg, rgba(59, 130, 246, 0.1) 0%, rgba(139, 92, 246, 0.1) 100%);
+                padding: 25px; border-radius: 15px; margin: 20px 0; border: 1px solid rgba(59, 130, 246, 0.2);">
+        <h3 style="color: #3b82f6; margin-top: 0; text-align: center; font-size: 1.4em;">
+            <i class="fas fa-trophy fa-color-yellow" style="margin-right: 8px;"></i> Research Credits & Acknowledgments
+        </h3>
+        <div style="display: grid; grid-template-columns: 1fr 1fr; gap: 20px; margin: 15px 0;">
+            <!-- Original Research Section (Left) -->
+            <div style="text-align: center;">
+                <h4 style="color: #8b5cf6; margin: 10px 0;"><i class="fas fa-flask fa-color-green" style="margin-right: 8px;"></i> Original Research</h4>
+                <p style="color: #e0e0e0; margin: 5px 0;">
+                    <a href="https://depth-anything-3.github.io" target="_blank"
+                       style="color: #3b82f6; text-decoration: none; font-weight: 600;">
+                        Depth Anything 3
+                    </a>
+                </p>
+            </div>
+            <!-- Previous Versions Section (Right) -->
+            <div style="text-align: center;">
+                <h4 style="color: #8b5cf6; margin: 10px 0;"><i class="fas fa-history fa-color-blue" style="margin-right: 8px;"></i> Previous Versions</h4>
+                <div style="display: flex; flex-direction: row; gap: 15px; justify-content: center; align-items: center;">
+                    <p style="color: #e0e0e0; margin: 0;">
+                        <a href="https://huggingface.co/spaces/LiheYoung/Depth-Anything" target="_blank"
+                           style="color: #3b82f6; text-decoration: none; font-weight: 600;">
+                            Depth-Anything
+                        </a>
+                    </p>
+                    <span style="color: #e0e0e0;">•</span>
+                    <p style="color: #e0e0e0; margin: 0;">
+                        <a href="https://huggingface.co/spaces/depth-anything/Depth-Anything-V2" target="_blank"
+                           style="color: #3b82f6; text-decoration: none; font-weight: 600;">
+                            Depth-Anything-V2
+                        </a>
+                    </p>
+                </div>
+            </div>
+        </div>
+        <!-- HF Demo Adapted from - Centered at the bottom of the whole block -->
+        <div style="margin-top: 20px; padding-top: 15px; border-top: 1px solid rgba(59, 130, 246, 0.3); text-align: center;">
+            <p style="color: #a0a0a0; font-size: 0.9em; margin: 0;">
+                <i class="fas fa-code-branch fa-color-gray" style="margin-right: 5px;"></i> HF demo adapted from <a href="https://huggingface.co/spaces/facebook/map-anything" target="_blank" style="color: inherit; text-decoration: none;">Map Anything</a>
+            </p>
+        </div>
+    </div>
+    """
+def get_gradio_theme():
+    """
+    Get the configured Gradio theme with adaptive tech colors.
+    Returns:
+        gr.themes.Base: Configured Gradio theme
+    """
+    import gradio as gr
+    return gr.themes.Base(
+        primary_hue=gr.themes.Color(
+            c50="#eff6ff",
+            c100="#dbeafe",
+            c200="#bfdbfe",
+            c300="#93c5fd",
+            c400="#60a5fa",
+            c500="#3b82f6",
+            c600="#2563eb",
+            c700="#1d4ed8",
+            c800="#1e40af",
+            c900="#1e3a8a",
+            c950="#172554",
+        ),
+        secondary_hue=gr.themes.Color(
+            c50="#f5f3ff",
+            c100="#ede9fe",
+            c200="#ddd6fe",
+            c300="#c4b5fd",
+            c400="#a78bfa",
+            c500="#8b5cf6",
+            c600="#7c3aed",
+            c700="#6d28d9",
+            c800="#5b21b6",
+            c900="#4c1d95",
+            c950="#2e1065",
+        ),
+        neutral_hue=gr.themes.Color(
+            c50="#f8fafc",
+            c100="#f1f5f9",
+            c200="#e2e8f0",
+            c300="#cbd5e1",
+            c400="#94a3b8",
+            c500="#64748b",
+            c600="#475569",
+            c700="#334155",
+            c800="#1e293b",
+            c900="#0f172a",
+            c950="#020617",
+        ),
+    )
+# Measure tab instructions HTML
+MEASURE_INSTRUCTIONS_HTML = """
+### Click points on the image to compute distance.
+> <i class="fas fa-triangle-exclamation fa-color-red" style="margin-right: 5px;"></i> Metric scale estimation is difficult on aerial/drone images.
+"""

src/depth_anything_3/app/gradio_app.py ADDED Viewed

	@@ -0,0 +1,747 @@

+# Copyright (c) 2025 ByteDance Ltd. and/or its affiliates
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+Refactored Gradio App for Depth Anything 3.
+This is the main application file that orchestrates all components.
+The original functionality has been split into modular components for better maintainability.
+"""
+import argparse
+import os
+from typing import Any, Dict, List
+import gradio as gr
+from depth_anything_3.app.css_and_html import GRADIO_CSS, get_gradio_theme
+from depth_anything_3.app.modules.event_handlers import EventHandlers
+from depth_anything_3.app.modules.ui_components import UIComponents
+# Set environment variables
+os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"
+class DepthAnything3App:
+    """
+    Main application class for Depth Anything 3 Gradio app.
+    """
+    def __init__(self, model_dir: str = None, workspace_dir: str = None, gallery_dir: str = None):
+        """
+        Initialize the application.
+        Args:
+            model_dir: Path to the model directory
+            workspace_dir: Path to the workspace directory
+            gallery_dir: Path to the gallery directory
+        """
+        self.model_dir = model_dir
+        self.workspace_dir = workspace_dir
+        self.gallery_dir = gallery_dir
+        # Set environment variables for directories
+        if self.model_dir:
+            os.environ["DA3_MODEL_DIR"] = self.model_dir
+        if self.workspace_dir:
+            os.environ["DA3_WORKSPACE_DIR"] = self.workspace_dir
+        if self.gallery_dir:
+            os.environ["DA3_GALLERY_DIR"] = self.gallery_dir
+        self.event_handlers = EventHandlers()
+        self.ui_components = UIComponents()
+    def cache_examples(
+        self,
+        show_cam: bool = True,
+        filter_black_bg: bool = False,
+        filter_white_bg: bool = False,
+        save_percentage: float = 20.0,
+        num_max_points: int = 1000,
+        cache_gs_tag: str = "",
+        gs_trj_mode: str = "smooth",
+        gs_video_quality: str = "low",
+    ) -> None:
+        """
+        Pre-cache all example scenes at startup.
+        Args:
+            show_cam: Whether to show camera in visualization
+            filter_black_bg: Whether to filter black background
+            filter_white_bg: Whether to filter white background
+            save_percentage: Filter percentage for point cloud
+            num_max_points: Maximum number of points
+            cache_gs_tag: Tag to match scene names for high-res+3DGS caching (e.g., "dl3dv")
+            gs_trj_mode: Trajectory mode for 3DGS
+            gs_video_quality: Video quality for 3DGS
+        """
+        from depth_anything_3.app.modules.utils import get_scene_info
+        examples_dir = os.path.join(self.workspace_dir, "examples")
+        if not os.path.exists(examples_dir):
+            print(f"Examples directory not found: {examples_dir}")
+            return
+        scenes = get_scene_info(examples_dir)
+        if not scenes:
+            print("No example scenes found to cache.")
+            return
+        print(f"\n{'='*60}")
+        print(f"Caching {len(scenes)} example scenes...")
+        print(f"{'='*60}\n")
+        for i, scene in enumerate(scenes, 1):
+            scene_name = scene["name"]
+            # Check if scene name matches the gs tag for high-res+3DGS caching
+            use_high_res_gs = cache_gs_tag and cache_gs_tag.lower() in scene_name.lower()
+            if use_high_res_gs:
+                print(f"[{i}/{len(scenes)}] Caching scene: {scene_name} (HIGH-RES + 3DGS)")
+                print(f"  - Number of images: {scene['num_images']}")
+                print(f"  - Matched tag: '{cache_gs_tag}' - using high_res + 3DGS")
+            else:
+                print(f"[{i}/{len(scenes)}] Caching scene: {scene_name} (LOW-RES)")
+                print(f"  - Number of images: {scene['num_images']}")
+            try:
+                # Load example scene
+                _, target_dir, _, _, _, _, _, _, _ = self.event_handlers.load_example_scene(
+                    scene_name
+                )
+                if target_dir and target_dir != "None":
+                    # Run reconstruction with appropriate settings
+                    print("  - Running reconstruction...")
+                    result = self.event_handlers.gradio_demo(
+                        target_dir=target_dir,
+                        show_cam=show_cam,
+                        filter_black_bg=filter_black_bg,
+                        filter_white_bg=filter_white_bg,
+                        process_res_method="high_res" if use_high_res_gs else "low_res",
+                        selected_first_frame="",
+                        save_percentage=save_percentage,
+                        num_max_points=num_max_points,
+                        infer_gs=use_high_res_gs,
+                        gs_trj_mode=gs_trj_mode,
+                        gs_video_quality=gs_video_quality,
+                    )
+                    # Check if successful
+                    if result[0] is not None:  # reconstruction_output
+                        print(f"  ✓ Scene '{scene_name}' cached successfully")
+                    else:
+                        print(f"  ✗ Scene '{scene_name}' caching failed: {result[1]}")
+                else:
+                    print(f"  ✗ Scene '{scene_name}' loading failed")
+            except Exception as e:
+                print(f"  ✗ Error caching scene '{scene_name}': {str(e)}")
+            print()
+        print("=" * 60)
+        print("Example scene caching completed!")
+        print("=" * 60 + "\n")
+    def create_app(self) -> gr.Blocks:
+        """
+        Create and configure the Gradio application.
+        Returns:
+            Configured Gradio Blocks interface
+        """
+        # Initialize theme
+        def get_theme():
+            return get_gradio_theme()
+        with gr.Blocks(theme=get_theme(), css=GRADIO_CSS) as demo:
+            # State variables for the tabbed interface
+            is_example = gr.Textbox(label="is_example", visible=False, value="None")
+            processed_data_state = gr.State(value=None)
+            measure_points_state = gr.State(value=[])
+            selected_first_frame_state = gr.State(value="")
+            selected_image_index_state = gr.State(value=0)  # Track selected image index
+            # current_view_index = gr.State(value=0)  # noqa: F841 Track current view index
+            # Header and description
+            self.ui_components.create_header_section()
+            self.ui_components.create_description_section()
+            target_dir_output = gr.Textbox(label="Target Dir", visible=False, value="None")
+            # Main content area
+            with gr.Row():
+                with gr.Column(scale=2):
+                    # Upload section
+                    (
+                        input_video,
+                        s_time_interval,
+                        input_images,
+                        image_gallery,
+                        select_first_frame_btn,
+                    ) = self.ui_components.create_upload_section()
+                with gr.Column(scale=4):
+                    with gr.Column():
+                        # gr.Markdown("**Metric 3D Reconstruction (Point Cloud and Camera Poses)**")
+                        # Reconstruction control section (buttons) - moved below tabs
+                        log_output = gr.Markdown(
+                            "Please upload a video or images, then click Reconstruct.",
+                            elem_classes=["custom-log"],
+                        )
+                        # Tabbed interface
+                        with gr.Tabs():
+                            with gr.Tab("Point Cloud & Cameras"):
+                                reconstruction_output = (
+                                    self.ui_components.create_3d_viewer_section()
+                                )
+                            with gr.Tab("Metric Depth"):
+                                (
+                                    prev_measure_btn,
+                                    measure_view_selector,
+                                    next_measure_btn,
+                                    measure_image,
+                                    measure_depth_image,
+                                    measure_text,
+                                ) = self.ui_components.create_measure_section()
+                            with gr.Tab("3DGS Rendered Novel Views"):
+                                gs_video, gs_info = self.ui_components.create_nvs_video()
+                        # Inference control section (before inference)
+                        (process_res_method_dropdown, infer_gs) = (
+                            self.ui_components.create_inference_control_section()
+                        )
+                        # Display control section - includes 3DGS options, buttons, and Visualization Options  # noqa: E501
+                        (
+                            show_cam,
+                            filter_black_bg,
+                            filter_white_bg,
+                            save_percentage,
+                            num_max_points,
+                            gs_trj_mode,
+                            gs_video_quality,
+                            submit_btn,
+                            clear_btn,
+                        ) = self.ui_components.create_display_control_section()
+                        # bind visibility of gs_trj_mode to infer_gs
+                        infer_gs.change(
+                            fn=lambda checked: (
+                                gr.update(visible=checked),
+                                gr.update(visible=checked),
+                                gr.update(visible=checked),
+                                gr.update(visible=(not checked)),
+                            ),
+                            inputs=infer_gs,
+                            outputs=[gs_trj_mode, gs_video_quality, gs_video, gs_info],
+                        )
+            # Example scenes section
+            gr.Markdown("## Example Scenes")
+            scenes = self.ui_components.create_example_scenes_section()
+            scene_components = self.ui_components.create_example_scene_grid(scenes)
+            # Set up event handlers
+            self._setup_event_handlers(
+                demo,
+                is_example,
+                processed_data_state,
+                measure_points_state,
+                target_dir_output,
+                input_video,
+                input_images,
+                s_time_interval,
+                image_gallery,
+                reconstruction_output,
+                log_output,
+                show_cam,
+                filter_black_bg,
+                filter_white_bg,
+                process_res_method_dropdown,
+                save_percentage,
+                submit_btn,
+                clear_btn,
+                num_max_points,
+                infer_gs,
+                select_first_frame_btn,
+                selected_first_frame_state,
+                selected_image_index_state,
+                measure_view_selector,
+                measure_image,
+                measure_depth_image,
+                measure_text,
+                prev_measure_btn,
+                next_measure_btn,
+                scenes,
+                scene_components,
+                gs_video,
+                gs_info,
+                gs_trj_mode,
+                gs_video_quality,
+            )
+            # Acknowledgements
+            self.ui_components.create_acknowledgements_section()
+        return demo
+    def _setup_event_handlers(
+        self,
+        demo: gr.Blocks,
+        is_example: gr.Textbox,
+        processed_data_state: gr.State,
+        measure_points_state: gr.State,
+        target_dir_output: gr.Textbox,
+        input_video: gr.Video,
+        input_images: gr.File,
+        s_time_interval: gr.Slider,
+        image_gallery: gr.Gallery,
+        reconstruction_output: gr.Model3D,
+        log_output: gr.Markdown,
+        show_cam: gr.Checkbox,
+        filter_black_bg: gr.Checkbox,
+        filter_white_bg: gr.Checkbox,
+        process_res_method_dropdown: gr.Dropdown,
+        save_percentage: gr.Slider,
+        submit_btn: gr.Button,
+        clear_btn: gr.ClearButton,
+        num_max_points: gr.Slider,
+        infer_gs: gr.Checkbox,
+        select_first_frame_btn: gr.Button,
+        selected_first_frame_state: gr.State,
+        selected_image_index_state: gr.State,
+        measure_view_selector: gr.Dropdown,
+        measure_image: gr.Image,
+        measure_depth_image: gr.Image,
+        measure_text: gr.Markdown,
+        prev_measure_btn: gr.Button,
+        next_measure_btn: gr.Button,
+        scenes: List[Dict[str, Any]],
+        scene_components: List[gr.Image],
+        gs_video: gr.Video,
+        gs_info: gr.Markdown,
+        gs_trj_mode: gr.Dropdown,
+        gs_video_quality: gr.Dropdown,
+    ) -> None:
+        """
+        Set up all event handlers for the application.
+        Args:
+            demo: Gradio Blocks interface
+            All other arguments: Gradio components to connect
+        """
+        # Configure clear button
+        clear_btn.add(
+            [
+                input_video,
+                input_images,
+                reconstruction_output,
+                log_output,
+                target_dir_output,
+                image_gallery,
+                gs_video,
+            ]
+        )
+        # Main reconstruction button
+        submit_btn.click(
+            fn=self.event_handlers.clear_fields, inputs=[], outputs=[reconstruction_output]
+        ).then(fn=self.event_handlers.update_log, inputs=[], outputs=[log_output]).then(
+            fn=self.event_handlers.gradio_demo,
+            inputs=[
+                target_dir_output,
+                show_cam,
+                filter_black_bg,
+                filter_white_bg,
+                process_res_method_dropdown,
+                selected_first_frame_state,
+                save_percentage,
+                # pass num_max_points
+                num_max_points,
+                infer_gs,
+                gs_trj_mode,
+                gs_video_quality,
+            ],
+            outputs=[
+                reconstruction_output,
+                log_output,
+                processed_data_state,
+                measure_image,
+                measure_depth_image,
+                measure_text,
+                measure_view_selector,
+                gs_video,
+                gs_video,  # gs_video visibility
+                gs_info,  # gs_info visibility
+            ],
+        ).then(
+            fn=lambda: "False",
+            inputs=[],
+            outputs=[is_example],  # set is_example to "False"
+        )
+        # Real-time visualization updates
+        self._setup_visualization_handlers(
+            show_cam,
+            filter_black_bg,
+            filter_white_bg,
+            process_res_method_dropdown,
+            target_dir_output,
+            is_example,
+            reconstruction_output,
+            log_output,
+        )
+        # File upload handlers
+        input_video.change(
+            fn=self.event_handlers.handle_uploads,
+            inputs=[input_video, input_images, s_time_interval],
+            outputs=[reconstruction_output, target_dir_output, image_gallery, log_output],
+        )
+        input_images.change(
+            fn=self.event_handlers.handle_uploads,
+            inputs=[input_video, input_images, s_time_interval],
+            outputs=[reconstruction_output, target_dir_output, image_gallery, log_output],
+        )
+        # Image gallery click handler (for selecting first frame)
+        def handle_image_selection(evt: gr.SelectData):
+            if evt is None or evt.index is None:
+                return "No image selected", 0
+            selected_index = evt.index
+            return f"Selected image {selected_index} as potential first frame", selected_index
+        image_gallery.select(
+            fn=handle_image_selection,
+            outputs=[log_output, selected_image_index_state],
+        )
+        # Select first frame handler
+        select_first_frame_btn.click(
+            fn=self.event_handlers.select_first_frame,
+            inputs=[image_gallery, selected_image_index_state],
+            outputs=[image_gallery, log_output, selected_first_frame_state],
+        )
+        # Navigation handlers
+        self._setup_navigation_handlers(
+            prev_measure_btn,
+            next_measure_btn,
+            measure_view_selector,
+            measure_image,
+            measure_depth_image,
+            measure_points_state,
+            processed_data_state,
+        )
+        # Measurement handler
+        measure_image.select(
+            fn=self.event_handlers.measure,
+            inputs=[processed_data_state, measure_points_state, measure_view_selector],
+            outputs=[measure_image, measure_depth_image, measure_points_state, measure_text],
+        )
+        # Example scene handlers
+        self._setup_example_scene_handlers(
+            scenes,
+            scene_components,
+            reconstruction_output,
+            target_dir_output,
+            image_gallery,
+            log_output,
+            is_example,
+            processed_data_state,
+            measure_view_selector,
+            measure_image,
+            measure_depth_image,
+            gs_video,
+            gs_info,
+        )
+    def _setup_visualization_handlers(
+        self,
+        show_cam: gr.Checkbox,
+        filter_black_bg: gr.Checkbox,
+        filter_white_bg: gr.Checkbox,
+        process_res_method_dropdown: gr.Dropdown,
+        target_dir_output: gr.Textbox,
+        is_example: gr.Textbox,
+        reconstruction_output: gr.Model3D,
+        log_output: gr.Markdown,
+    ) -> None:
+        """Set up visualization update handlers."""
+        # Common inputs for visualization updates
+        viz_inputs = [
+            target_dir_output,
+            show_cam,
+            is_example,
+            filter_black_bg,
+            filter_white_bg,
+            process_res_method_dropdown,
+        ]
+        # Set up change handlers for all visualization controls
+        for component in [show_cam, filter_black_bg, filter_white_bg]:
+            component.change(
+                fn=self.event_handlers.update_visualization,
+                inputs=viz_inputs,
+                outputs=[reconstruction_output, log_output],
+            )
+    def _setup_navigation_handlers(
+        self,
+        prev_measure_btn: gr.Button,
+        next_measure_btn: gr.Button,
+        measure_view_selector: gr.Dropdown,
+        measure_image: gr.Image,
+        measure_depth_image: gr.Image,
+        measure_points_state: gr.State,
+        processed_data_state: gr.State,
+    ) -> None:
+        """Set up navigation handlers for measure tab."""
+        # Measure tab navigation
+        prev_measure_btn.click(
+            fn=lambda processed_data, current_selector: self.event_handlers.navigate_measure_view(
+                processed_data, current_selector, -1
+            ),
+            inputs=[processed_data_state, measure_view_selector],
+            outputs=[
+                measure_view_selector,
+                measure_image,
+                measure_depth_image,
+                measure_points_state,
+            ],
+        )
+        next_measure_btn.click(
+            fn=lambda processed_data, current_selector: self.event_handlers.navigate_measure_view(
+                processed_data, current_selector, 1
+            ),
+            inputs=[processed_data_state, measure_view_selector],
+            outputs=[
+                measure_view_selector,
+                measure_image,
+                measure_depth_image,
+                measure_points_state,
+            ],
+        )
+        measure_view_selector.change(
+            fn=lambda processed_data, selector_value: (
+                self.event_handlers.update_measure_view(
+                    processed_data, int(selector_value.split()[1]) - 1
+                )
+                if selector_value
+                else (None, None, [])
+            ),
+            inputs=[processed_data_state, measure_view_selector],
+            outputs=[measure_image, measure_depth_image, measure_points_state],
+        )
+    def _setup_example_scene_handlers(
+        self,
+        scenes: List[Dict[str, Any]],
+        scene_components: List[gr.Image],
+        reconstruction_output: gr.Model3D,
+        target_dir_output: gr.Textbox,
+        image_gallery: gr.Gallery,
+        log_output: gr.Markdown,
+        is_example: gr.Textbox,
+        processed_data_state: gr.State,
+        measure_view_selector: gr.Dropdown,
+        measure_image: gr.Image,
+        measure_depth_image: gr.Image,
+        gs_video: gr.Video,
+        gs_info: gr.Markdown,
+    ) -> None:
+        """Set up example scene handlers."""
+        def load_and_update_measure(name):
+            result = self.event_handlers.load_example_scene(name)
+            # result = (reconstruction_output, target_dir, image_paths, log_message, processed_data, measure_view_selector, gs_video, gs_video_vis, gs_info_vis)  # noqa: E501
+            # Update measure view if processed_data is available
+            measure_img = None
+            measure_depth = None
+            if result[4] is not None:  # processed_data exists
+                measure_img, measure_depth, _ = (
+                    self.event_handlers.visualization_handler.update_measure_view(result[4], 0)
+                )
+            return result + ("True", measure_img, measure_depth)
+        for i, scene in enumerate(scenes):
+            if i < len(scene_components):
+                scene_components[i].select(
+                    fn=lambda name=scene["name"]: load_and_update_measure(name),
+                    outputs=[
+                        reconstruction_output,
+                        target_dir_output,
+                        image_gallery,
+                        log_output,
+                        processed_data_state,
+                        measure_view_selector,
+                        gs_video,
+                        gs_video,  # gs_video_visibility
+                        gs_info,  # gs_info_visibility
+                        is_example,
+                        measure_image,
+                        measure_depth_image,
+                    ],
+                )
+    def launch(self, host: str = "127.0.0.1", port: int = 7860, **kwargs) -> None:
+        """
+        Launch the application.
+        Args:
+            host: Host address to bind to
+            port: Port number to bind to
+            **kwargs: Additional arguments for demo.launch()
+        """
+        demo = self.create_app()
+        demo.queue(max_size=20).launch(
+            show_error=True, ssr_mode=False, server_name=host, server_port=port, **kwargs
+        )
+def main():
+    """Main function to run the application."""
+    parser = argparse.ArgumentParser(
+        description="Depth Anything 3 Gradio Application",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  # Basic usage
+  python gradio_app.py --help
+  python gradio_app.py --host 0.0.0.0 --port 8080
+  python gradio_app.py --model-dir /path/to/model --workspace-dir /path/to/workspace
+  # Cache examples at startup (all low-res)
+  python gradio_app.py --cache-examples
+  # Cache with selective high-res+3DGS for scenes matching tag
+  python gradio_app.py --cache-examples --cache-gs-tag dl3dv
+  # This will use high-res + 3DGS for scenes containing "dl3dv" in their name,
+  # and low-res only for other scenes
+        """,
+    )
+    # Server configuration
+    parser.add_argument(
+        "--host", default="127.0.0.1", help="Host address to bind to (default: 127.0.0.1)"
+    )
+    parser.add_argument(
+        "--port", type=int, default=7860, help="Port number to bind to (default: 7860)"
+    )
+    # Directory configuration
+    parser.add_argument(
+        "--model-dir",
+        default="depth-anything/DA3NESTED-GIANT-LARGE",
+        help="Path to the model directory (default: depth-anything/DA3NESTED-GIANT-LARGE)",
+    )
+    parser.add_argument(
+        "--workspace-dir",
+        default="workspace/gradio",  # noqa: E501
+        help="Path to the workspace directory (default: workspace/gradio)",  # noqa: E501
+    )
+    parser.add_argument(
+        "--gallery-dir",
+        default="workspace/gallery",
+        help="Path to the gallery directory (default: workspace/gallery)",  # noqa: E501
+    )
+    # Additional Gradio options
+    parser.add_argument("--share", action="store_true", help="Create a public link for the app")
+    parser.add_argument("--debug", action="store_true", help="Enable debug mode")
+    # Example caching options
+    parser.add_argument(
+        "--cache-examples",
+        action="store_true",
+        help="Pre-cache all example scenes at startup for faster loading",
+    )
+    parser.add_argument(
+        "--cache-gs-tag",
+        type=str,
+        default="",
+        help="Tag to match scene names for high-res+3DGS caching (e.g., 'dl3dv'). Scenes containing this tag will use high_res and infer_gs=True; others will use low_res only.",  # noqa: E501
+    )
+    args = parser.parse_args()
+    # Create directories if they don't exist
+    os.makedirs(args.workspace_dir, exist_ok=True)
+    os.makedirs(args.gallery_dir, exist_ok=True)
+    # Initialize and launch the application
+    app = DepthAnything3App(
+        model_dir=args.model_dir, workspace_dir=args.workspace_dir, gallery_dir=args.gallery_dir
+    )
+    # Prepare launch arguments
+    launch_kwargs = {"share": args.share, "debug": args.debug}
+    print("Starting Depth Anything 3 Gradio App...")
+    print(f"Host: {args.host}")
+    print(f"Port: {args.port}")
+    print(f"Model Directory: {args.model_dir}")
+    print(f"Workspace Directory: {args.workspace_dir}")
+    print(f"Gallery Directory: {args.gallery_dir}")
+    print(f"Share: {args.share}")
+    print(f"Debug: {args.debug}")
+    print(f"Cache Examples: {args.cache_examples}")
+    if args.cache_examples:
+        if args.cache_gs_tag:
+            print(
+                f"Cache GS Tag: '{args.cache_gs_tag}' (scenes matching this tag will use high-res + 3DGS)"  # noqa: E501
+            )  # noqa: E501
+        else:
+            print("Cache GS Tag: None (all scenes will use low-res only)")
+    # Pre-cache examples if requested
+    if args.cache_examples:
+        print("\n" + "=" * 60)
+        print("Pre-caching mode enabled")
+        if args.cache_gs_tag:
+            print(f"Scenes containing '{args.cache_gs_tag}' will use HIGH-RES + 3DGS")
+            print("Other scenes will use LOW-RES only")
+        else:
+            print("All scenes will use LOW-RES only")
+        print("=" * 60)
+        app.cache_examples(
+            show_cam=True,
+            filter_black_bg=False,
+            filter_white_bg=False,
+            save_percentage=5.0,
+            num_max_points=1000,
+            cache_gs_tag=args.cache_gs_tag,
+            gs_trj_mode="smooth",
+            gs_video_quality="low",
+        )
+    app.launch(host=args.host, port=args.port, **launch_kwargs)
+if __name__ == "__main__":
+    main()

src/depth_anything_3/app/modules/__init__.py ADDED Viewed

	@@ -0,0 +1,45 @@

+# Copyright (c) 2025 ByteDance Ltd. and/or its affiliates
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+Modules package for Depth Anything 3 Gradio app.
+This package contains all the modular components for the Gradio application.
+"""
+from depth_anything_3.app.modules.event_handlers import EventHandlers
+from depth_anything_3.app.modules.file_handlers import FileHandler
+from depth_anything_3.app.modules.model_inference import ModelInference
+from depth_anything_3.app.modules.ui_components import UIComponents
+from depth_anything_3.app.modules.utils import (
+    cleanup_memory,
+    create_depth_visualization,
+    get_logo_base64,
+    get_scene_info,
+    save_to_gallery_func,
+)
+from depth_anything_3.app.modules.visualization import VisualizationHandler
+__all__ = [
+    "ModelInference",
+    "FileHandler",
+    "VisualizationHandler",
+    "EventHandlers",
+    "UIComponents",
+    "create_depth_visualization",
+    "save_to_gallery_func",
+    "get_scene_info",
+    "cleanup_memory",
+    "get_logo_base64",
+]

src/depth_anything_3/app/modules/__pycache__/__init__.cpython-311.pyc ADDED Viewed

Binary file (1.22 kB). View file

src/depth_anything_3/app/modules/__pycache__/event_handlers.cpython-311.pyc ADDED Viewed

Binary file (25.3 kB). View file

src/depth_anything_3/app/modules/__pycache__/file_handlers.cpython-311.pyc ADDED Viewed

Binary file (13 kB). View file

src/depth_anything_3/app/modules/__pycache__/model_inference.cpython-311.pyc ADDED Viewed

Binary file (12.3 kB). View file

src/depth_anything_3/app/modules/__pycache__/ui_components.cpython-311.pyc ADDED Viewed

Binary file (19.3 kB). View file

src/depth_anything_3/app/modules/__pycache__/utils.cpython-311.pyc ADDED Viewed

Binary file (9.22 kB). View file

src/depth_anything_3/app/modules/__pycache__/visualization.cpython-311.pyc ADDED Viewed

Binary file (19 kB). View file

src/depth_anything_3/app/modules/event_handlers.py ADDED Viewed

	@@ -0,0 +1,629 @@

+# Copyright (c) 2025 ByteDance Ltd. and/or its affiliates
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+Event handling module for Depth Anything 3 Gradio app.
+This module handles all event callbacks and user interactions.
+"""
+import os
+import time
+from glob import glob
+from typing import Any, Dict, List, Optional, Tuple
+import gradio as gr
+import numpy as np
+import torch
+from depth_anything_3.app.modules.file_handlers import FileHandler
+from depth_anything_3.app.modules.model_inference import ModelInference
+from depth_anything_3.app.modules.utils import cleanup_memory
+from depth_anything_3.app.modules.visualization import VisualizationHandler
+class EventHandlers:
+    """
+    Handles all event callbacks and user interactions for the Gradio app.
+    """
+    def __init__(self):
+        """Initialize the event handlers."""
+        self.model_inference = ModelInference()
+        self.file_handler = FileHandler()
+        self.visualization_handler = VisualizationHandler()
+    def clear_fields(self) -> None:
+        """
+        Clears the 3D viewer, the stored target_dir, and empties the gallery.
+        """
+        return None
+    def update_log(self) -> str:
+        """
+        Display a quick log message while waiting.
+        """
+        return "Loading and Reconstructing..."
+    def save_current_visualization(
+        self,
+        target_dir: str,
+        save_percentage: float,
+        show_cam: bool,
+        filter_black_bg: bool,
+        filter_white_bg: bool,
+        processed_data: Optional[Dict],
+        scene_name: str = "",
+    ) -> str:
+        """
+        Save current visualization results to gallery with specified save percentage.
+        Args:
+            target_dir: Directory containing results
+            save_percentage: Percentage of points to save (0-100)
+            show_cam: Whether to show cameras
+            filter_black_bg: Whether to filter black background
+            filter_white_bg: Whether to filter white background
+            processed_data: Processed data from reconstruction
+        Returns:
+            Status message
+        """
+        if not target_dir or target_dir == "None" or not os.path.isdir(target_dir):
+            return "No reconstruction available. Please run 'Reconstruct' first."
+        if processed_data is None:
+            return "No processed data available. Please run 'Reconstruct' first."
+        try:
+            # Add debug information
+            print("[DEBUG] save_current_visualization called with:")
+            print(f"  target_dir: {target_dir}")
+            print(f"  save_percentage: {save_percentage}")
+            print(f"  show_cam: {show_cam}")
+            print(f"  filter_black_bg: {filter_black_bg}")
+            print(f"  filter_white_bg: {filter_white_bg}")
+            print(f"  processed_data: {processed_data is not None}")
+            # Import the gallery save function
+            # Create gallery name with user input or auto-generated
+            import datetime
+            from .utils import save_to_gallery_func
+            timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
+            if scene_name and scene_name.strip():
+                gallery_name = f"{scene_name.strip()}_{timestamp}_pct{save_percentage:.0f}"
+            else:
+                gallery_name = f"save_{timestamp}_pct{save_percentage:.0f}"
+            print(f"[DEBUG] Saving to gallery with name: {gallery_name}")
+            # Save entire process folder to gallery
+            success, message = save_to_gallery_func(
+                target_dir=target_dir, processed_data=processed_data, gallery_name=gallery_name
+            )
+            if success:
+                print(f"[DEBUG] Gallery save completed successfully: {message}")
+                return (
+                    "Successfully saved to gallery!\n"
+                    f"Gallery name: {gallery_name}\n"
+                    f"Save percentage: {save_percentage}%\n"
+                    f"Show cameras: {show_cam}\n"
+                    f"Filter black bg: {filter_black_bg}\n"
+                    f"Filter white bg: {filter_white_bg}\n\n"
+                    f"{message}"
+                )
+            else:
+                print(f"[DEBUG] Gallery save failed: {message}")
+                return f"Failed to save to gallery: {message}"
+        except Exception as e:
+            return f"Error saving visualization: {str(e)}"
+    def gradio_demo(
+        self,
+        target_dir: str,
+        show_cam: bool = True,
+        filter_black_bg: bool = False,
+        filter_white_bg: bool = False,
+        process_res_method: str = "upper_bound_resize",
+        selected_first_frame: str = "",
+        save_percentage: float = 30.0,
+        num_max_points: int = 1_000_000,
+        infer_gs: bool = False,
+        gs_trj_mode: str = "extend",
+        gs_video_quality: str = "high",
+    ) -> Tuple[
+        Optional[str],
+        str,
+        Optional[Dict],
+        Optional[np.ndarray],
+        Optional[np.ndarray],
+        str,
+        gr.Dropdown,
+        Optional[str],  # gs video path
+        gr.update,  # gs video visibility update
+        gr.update,  # gs info visibility update
+    ]:
+        """
+        Perform reconstruction using the already-created target_dir/images.
+        Args:
+            target_dir: Directory containing images
+            show_cam: Whether to show camera
+            filter_black_bg: Whether to filter black background
+            filter_white_bg: Whether to filter white background
+            process_res_method: Method for resizing input images
+            selected_first_frame: Selected first frame filename
+            infer_gs: Whether to infer 3D Gaussian Splatting
+        Returns:
+            Tuple of reconstruction results
+        """
+        if not os.path.isdir(target_dir) or target_dir == "None":
+            return (
+                None,
+                "No valid target directory found. Please upload first.",
+                None,
+                None,
+                None,
+                "",
+                None,
+                None,
+                gr.update(visible=False),  # gs_video
+                gr.update(visible=True),  # gs_info
+            )
+        start_time = time.time()
+        cleanup_memory()
+        # Get image files for logging
+        target_dir_images = os.path.join(target_dir, "images")
+        all_files = (
+            sorted(os.listdir(target_dir_images)) if os.path.isdir(target_dir_images) else []
+        )
+        print("Running DepthAnything3 model...")
+        print(f"Selected first frame: {selected_first_frame}")
+        # Validate selected_first_frame against current image list
+        if selected_first_frame and target_dir_images:
+            current_files = (
+                sorted(os.listdir(target_dir_images)) if os.path.isdir(target_dir_images) else []
+            )
+            if selected_first_frame not in current_files:
+                print(
+                    f"Selected first frame '{selected_first_frame}' not found in "
+                    "current images. Using default order."
+                )
+                selected_first_frame = ""  # Reset to use default order
+        with torch.no_grad():
+            prediction, processed_data = self.model_inference.run_inference(
+                target_dir,
+                process_res_method=process_res_method,
+                show_camera=show_cam,
+                selected_first_frame=selected_first_frame,
+                save_percentage=save_percentage,
+                num_max_points=int(num_max_points * 1000),  # Convert K to actual count
+                infer_gs=infer_gs,
+                gs_trj_mode=gs_trj_mode,
+                gs_video_quality=gs_video_quality,
+            )
+        # The GLB file is already generated by the API
+        glbfile = os.path.join(target_dir, "scene.glb")
+        # Handle 3DGS video based on infer_gs flag
+        gsvideo_path = None
+        gs_video_visible = False
+        gs_info_visible = True
+        if infer_gs:
+            try:
+                gsvideo_path = sorted(glob(os.path.join(target_dir, "gs_video", "*.mp4")))[-1]
+                gs_video_visible = True
+                gs_info_visible = False
+            except IndexError:
+                gsvideo_path = None
+                print("3DGS video not found, but infer_gs was enabled")
+        # Cleanup
+        cleanup_memory()
+        end_time = time.time()
+        print(f"Total time: {end_time - start_time:.2f} seconds")
+        log_msg = f"Reconstruction Success ({len(all_files)} frames). Waiting for visualization."
+        # Populate visualization tabs with processed data
+        depth_vis, measure_img, measure_depth_vis, measure_pts = (
+            self.visualization_handler.populate_visualization_tabs(processed_data)
+        )
+        # Update view selectors based on available views
+        depth_selector, measure_selector = self.visualization_handler.update_view_selectors(
+            processed_data
+        )
+        return (
+            glbfile,
+            log_msg,
+            processed_data,
+            measure_img,  # measure_image
+            measure_depth_vis,  # measure_depth_image
+            "",  # measure_text (empty initially)
+            measure_selector,  # measure_view_selector
+            gsvideo_path,
+            gr.update(visible=gs_video_visible),  # gs_video visibility
+            gr.update(visible=gs_info_visible),  # gs_info visibility
+        )
+    def update_visualization(
+        self,
+        target_dir: str,
+        show_cam: bool,
+        is_example: str,
+        filter_black_bg: bool = False,
+        filter_white_bg: bool = False,
+        process_res_method: str = "upper_bound_resize",
+    ) -> Tuple[gr.update, str]:
+        """
+        Reload saved predictions from npz, create (or reuse) the GLB for new parameters,
+        and return it for the 3D viewer.
+        Args:
+            target_dir: Directory containing results
+            show_cam: Whether to show camera
+            is_example: Whether this is an example scene
+            filter_black_bg: Whether to filter black background
+            filter_white_bg: Whether to filter white background
+            process_res_method: Method for resizing input images
+        Returns:
+            Tuple of (glb_file, log_message)
+        """
+        if not target_dir or target_dir == "None" or not os.path.isdir(target_dir):
+            return (
+                gr.update(),
+                "No reconstruction available. Please click the Reconstruct button first.",
+            )
+        # Check if GLB exists (could be cached example or reconstructed scene)
+        glbfile = os.path.join(target_dir, "scene.glb")
+        if os.path.exists(glbfile):
+            return (
+                glbfile,
+                (
+                    "Visualization loaded from cache."
+                    if is_example == "True"
+                    else "Visualization updated."
+                ),
+            )
+        # If no GLB but it's an example that hasn't been reconstructed yet
+        if is_example == "True":
+            return (
+                gr.update(),
+                "No reconstruction available. Please click the Reconstruct button first.",
+            )
+        # For non-examples, check predictions.npz
+        predictions_path = os.path.join(target_dir, "predictions.npz")
+        if not os.path.exists(predictions_path):
+            error_message = (
+                f"No reconstruction available at {predictions_path}. "
+                "Please run 'Reconstruct' first."
+            )
+            return gr.update(), error_message
+        loaded = np.load(predictions_path, allow_pickle=True)
+        predictions = {key: loaded[key] for key in loaded.keys()}  # noqa: F841
+        return (
+            glbfile,
+            "Visualization updated.",
+        )
+    def handle_uploads(
+        self,
+        input_video: Optional[str],
+        input_images: Optional[List],
+        s_time_interval: float = 10.0,
+    ) -> Tuple[Optional[str], Optional[str], Optional[List], Optional[str]]:
+        """
+        Handle file uploads and update gallery.
+        Args:
+            input_video: Path to input video file
+            input_images: List of input image files
+            s_time_interval: Sampling FPS (frames per second) for frame extraction
+        Returns:
+            Tuple of (reconstruction_output, target_dir, image_paths, log_message)
+        """
+        return self.file_handler.update_gallery_on_upload(
+            input_video, input_images, s_time_interval
+        )
+    def load_example_scene(self, scene_name: str, examples_dir: str = None) -> Tuple[
+        Optional[str],
+        Optional[str],
+        Optional[List],
+        str,
+        Optional[Dict],
+        gr.Dropdown,
+        Optional[str],
+        gr.update,
+        gr.update,
+    ]:
+        """
+        Load a scene from examples directory.
+        Args:
+            scene_name: Name of the scene to load
+            examples_dir: Path to examples directory (if None, uses workspace_dir/examples)
+        Returns:
+            Tuple of (reconstruction_output, target_dir, image_paths, log_message, processed_data, measure_view_selector, gs_video, gs_video_vis, gs_info_vis)  # noqa: E501
+        """
+        if examples_dir is None:
+            # Get workspace directory from environment variable
+            workspace_dir = os.environ.get("DA3_WORKSPACE_DIR", "gradio_workspace")
+            examples_dir = os.path.join(workspace_dir, "examples")
+        reconstruction_output, target_dir, image_paths, log_message = (
+            self.file_handler.load_example_scene(scene_name, examples_dir)
+        )
+        # Try to load cached processed data if available
+        processed_data = None
+        measure_view_selector = gr.Dropdown(choices=["View 1"], value="View 1")
+        gs_video_path = None
+        gs_video_visible = False
+        gs_info_visible = True
+        if target_dir and target_dir != "None":
+            predictions_path = os.path.join(target_dir, "predictions.npz")
+            if os.path.exists(predictions_path):
+                try:
+                    # Load predictions from cache
+                    loaded = np.load(predictions_path, allow_pickle=True)
+                    predictions = {key: loaded[key] for key in loaded.keys()}
+                    # Reconstruct processed_data structure
+                    num_images = len(predictions.get("images", []))
+                    processed_data = {}
+                    for i in range(num_images):
+                        processed_data[i] = {
+                            "image": predictions["images"][i] if "images" in predictions else None,
+                            "depth": predictions["depths"][i] if "depths" in predictions else None,
+                            "depth_image": os.path.join(
+                                target_dir, "depth_vis", f"{i:04d}.jpg"  # Fixed: use .jpg not .png
+                            ),
+                            "intrinsics": (
+                                predictions["intrinsics"][i]
+                                if "intrinsics" in predictions
+                                and i < len(predictions["intrinsics"])
+                                else None
+                            ),
+                            "mask": None,
+                        }
+                    # Update measure view selector
+                    choices = [f"View {i + 1}" for i in range(num_images)]
+                    measure_view_selector = gr.Dropdown(choices=choices, value=choices[0])
+                except Exception as e:
+                    print(f"Error loading cached data: {e}")
+            # Check for cached 3DGS video
+            gs_video_dir = os.path.join(target_dir, "gs_video")
+            if os.path.exists(gs_video_dir):
+                try:
+                    from glob import glob
+                    gs_videos = sorted(glob(os.path.join(gs_video_dir, "*.mp4")))
+                    if gs_videos:
+                        gs_video_path = gs_videos[-1]
+                        gs_video_visible = True
+                        gs_info_visible = False
+                        print(f"Loaded cached 3DGS video: {gs_video_path}")
+                except Exception as e:
+                    print(f"Error loading cached 3DGS video: {e}")
+        return (
+            reconstruction_output,
+            target_dir,
+            image_paths,
+            log_message,
+            processed_data,
+            measure_view_selector,
+            gs_video_path,
+            gr.update(visible=gs_video_visible),
+            gr.update(visible=gs_info_visible),
+        )
+    def navigate_depth_view(
+        self,
+        processed_data: Optional[Dict[int, Dict[str, Any]]],
+        current_selector: str,
+        direction: int,
+    ) -> Tuple[str, Optional[str]]:
+        """
+        Navigate depth view.
+        Args:
+            processed_data: Processed data dictionary
+            current_selector: Current selector value
+            direction: Direction to navigate
+        Returns:
+            Tuple of (new_selector_value, depth_vis)
+        """
+        return self.visualization_handler.navigate_depth_view(
+            processed_data, current_selector, direction
+        )
+    def update_depth_view(
+        self, processed_data: Optional[Dict[int, Dict[str, Any]]], view_index: int
+    ) -> Optional[str]:
+        """
+        Update depth view for a specific view index.
+        Args:
+            processed_data: Processed data dictionary
+            view_index: Index of the view to update
+        Returns:
+            Path to depth visualization image or None
+        """
+        return self.visualization_handler.update_depth_view(processed_data, view_index)
+    def navigate_measure_view(
+        self,
+        processed_data: Optional[Dict[int, Dict[str, Any]]],
+        current_selector: str,
+        direction: int,
+    ) -> Tuple[str, Optional[np.ndarray], Optional[np.ndarray], List]:
+        """
+        Navigate measure view.
+        Args:
+            processed_data: Processed data dictionary
+            current_selector: Current selector value
+            direction: Direction to navigate
+        Returns:
+            Tuple of (new_selector_value, measure_image, depth_right_half, measure_points)
+        """
+        return self.visualization_handler.navigate_measure_view(
+            processed_data, current_selector, direction
+        )
+    def update_measure_view(
+        self, processed_data: Optional[Dict[int, Dict[str, Any]]], view_index: int
+    ) -> Tuple[Optional[np.ndarray], Optional[np.ndarray], List]:
+        """
+        Update measure view for a specific view index.
+        Args:
+            processed_data: Processed data dictionary
+            view_index: Index of the view to update
+        Returns:
+            Tuple of (measure_image, depth_right_half, measure_points)
+        """
+        return self.visualization_handler.update_measure_view(processed_data, view_index)
+    def measure(
+        self,
+        processed_data: Optional[Dict[int, Dict[str, Any]]],
+        measure_points: List,
+        current_view_selector: str,
+        event: gr.SelectData,
+    ) -> List:
+        """
+        Handle measurement on images.
+        Args:
+            processed_data: Processed data dictionary
+            measure_points: List of current measure points
+            current_view_selector: Current view selector value
+            event: Gradio select event
+        Returns:
+            List of [image, depth_right_half, measure_points, text]
+        """
+        return self.visualization_handler.measure(
+            processed_data, measure_points, current_view_selector, event
+        )
+    def select_first_frame(
+        self, image_gallery: List, selected_index: int = 0
+    ) -> Tuple[List, str, str]:
+        """
+        Select the first frame from the image gallery.
+        Args:
+            image_gallery: List of images in the gallery
+            selected_index: Index of the selected image (default: 0)
+        Returns:
+            Tuple of (updated_image_gallery, log_message, selected_frame_path)
+        """
+        try:
+            if not image_gallery or len(image_gallery) == 0:
+                return image_gallery, "No images available to select as first frame.", ""
+            # Handle None or invalid selected_index
+            if (
+                selected_index is None
+                or selected_index < 0
+                or selected_index >= len(image_gallery)
+            ):
+                selected_index = 0
+                print(f"Invalid selected_index: {selected_index}, using default: 0")
+            # Get the selected image based on index
+            selected_image = image_gallery[selected_index]
+            print(f"Selected image index: {selected_index}")
+            print(f"Total images: {len(image_gallery)}")
+            # Extract the file path from the selected image
+            selected_frame_path = ""
+            print(f"Selected image type: {type(selected_image)}")
+            print(f"Selected image: {selected_image}")
+            if isinstance(selected_image, tuple):
+                # Gradio Gallery returns tuple (path, None)
+                selected_frame_path = selected_image[0]
+            elif isinstance(selected_image, str):
+                selected_frame_path = selected_image
+            elif hasattr(selected_image, "name"):
+                selected_frame_path = selected_image.name
+            elif isinstance(selected_image, dict):
+                if "name" in selected_image:
+                    selected_frame_path = selected_image["name"]
+                elif "path" in selected_image:
+                    selected_frame_path = selected_image["path"]
+                elif "src" in selected_image:
+                    selected_frame_path = selected_image["src"]
+            else:
+                # Try to convert to string
+                selected_frame_path = str(selected_image)
+            print(f"Extracted path: {selected_frame_path}")
+            # Extract filename from the path for matching
+            import os
+            selected_filename = os.path.basename(selected_frame_path)
+            print(f"Selected filename: {selected_filename}")
+            # Move the selected image to the front
+            updated_gallery = [selected_image] + [
+                img for img in image_gallery if img != selected_image
+            ]
+            log_message = (
+                f"Selected frame: {selected_filename}. "
+                f"Moved to first position. Total frames: {len(updated_gallery)}"
+            )
+            return updated_gallery, log_message, selected_filename
+        except Exception as e:
+            print(f"Error selecting first frame: {e}")
+            return image_gallery, f"Error selecting first frame: {e}", ""

src/depth_anything_3/app/modules/file_handlers.py ADDED Viewed

	@@ -0,0 +1,304 @@

+# Copyright (c) 2025 ByteDance Ltd. and/or its affiliates
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+File handling module for Depth Anything 3 Gradio app.
+This module handles file uploads, video processing, and file operations.
+"""
+import os
+import shutil
+import time
+from datetime import datetime
+from typing import List, Optional, Tuple
+import cv2
+from PIL import Image
+from pillow_heif import register_heif_opener
+register_heif_opener()
+class FileHandler:
+    """
+    Handles file uploads and processing for the Gradio app.
+    """
+    def __init__(self):
+        """Initialize the file handler."""
+    def handle_uploads(
+        self,
+        input_video: Optional[str],
+        input_images: Optional[List],
+        s_time_interval: float = 10.0,
+    ) -> Tuple[str, List[str]]:
+        """
+        Create a new 'target_dir' + 'images' subfolder, and place user-uploaded
+        images or extracted frames from video into it.
+        Args:
+            input_video: Path to input video file
+            input_images: List of input image files
+            s_time_interval: Sampling FPS (frames per second) for frame extraction
+        Returns:
+            Tuple of (target_dir, image_paths)
+        """
+        start_time = time.time()
+        # Get workspace directory from environment variable or use default
+        workspace_dir = os.environ.get("DA3_WORKSPACE_DIR", "gradio_workspace")
+        if not os.path.exists(workspace_dir):
+            os.makedirs(workspace_dir)
+        # Create input_images subdirectory
+        input_images_dir = os.path.join(workspace_dir, "input_images")
+        if not os.path.exists(input_images_dir):
+            os.makedirs(input_images_dir)
+        # Create a unique folder name within input_images
+        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S_%f")
+        target_dir = os.path.join(input_images_dir, f"session_{timestamp}")
+        target_dir_images = os.path.join(target_dir, "images")
+        # Clean up if somehow that folder already exists
+        if os.path.exists(target_dir):
+            shutil.rmtree(target_dir)
+        os.makedirs(target_dir)
+        os.makedirs(target_dir_images)
+        image_paths = []
+        # Handle images
+        if input_images is not None:
+            image_paths.extend(self._process_images(input_images, target_dir_images))
+        # Handle video
+        if input_video is not None:
+            image_paths.extend(
+                self._process_video(input_video, target_dir_images, s_time_interval)
+            )
+        # Sort final images for gallery
+        image_paths = sorted(image_paths)
+        end_time = time.time()
+        print(f"Files copied to {target_dir_images}; took {end_time - start_time:.3f} seconds")
+        return target_dir, image_paths
+    def _process_images(self, input_images: List, target_dir_images: str) -> List[str]:
+        """
+        Process uploaded images.
+        Args:
+            input_images: List of input image files
+            target_dir_images: Target directory for images
+        Returns:
+            List of processed image paths
+        """
+        image_paths = []
+        for file_data in input_images:
+            if isinstance(file_data, dict) and "name" in file_data:
+                file_path = file_data["name"]
+            else:
+                file_path = file_data
+            # Check if the file is a HEIC image
+            file_ext = os.path.splitext(file_path)[1].lower()
+            if file_ext in [".heic", ".heif"]:
+                # Convert HEIC to JPEG for better gallery compatibility
+                try:
+                    with Image.open(file_path) as img:
+                        # Convert to RGB if necessary (HEIC can have different color modes)
+                        if img.mode not in ("RGB", "L"):
+                            img = img.convert("RGB")
+                        # Create JPEG filename
+                        base_name = os.path.splitext(os.path.basename(file_path))[0]
+                        dst_path = os.path.join(target_dir_images, f"{base_name}.jpg")
+                        # Save as JPEG with high quality
+                        img.save(dst_path, "JPEG", quality=95)
+                        image_paths.append(dst_path)
+                        print(
+                            f"Converted HEIC to JPEG: {os.path.basename(file_path)} -> "
+                            f"{os.path.basename(dst_path)}"
+                        )
+                except Exception as e:
+                    print(f"Error converting HEIC file {file_path}: {e}")
+                    # Fall back to copying as is
+                    dst_path = os.path.join(target_dir_images, os.path.basename(file_path))
+                    shutil.copy(file_path, dst_path)
+                    image_paths.append(dst_path)
+            else:
+                # Regular image files - copy as is
+                dst_path = os.path.join(target_dir_images, os.path.basename(file_path))
+                shutil.copy(file_path, dst_path)
+                image_paths.append(dst_path)
+        return image_paths
+    def _process_video(
+        self, input_video: str, target_dir_images: str, s_time_interval: float
+    ) -> List[str]:
+        """
+        Process video file and extract frames.
+        Args:
+            input_video: Path to input video file
+            target_dir_images: Target directory for extracted frames
+            s_time_interval: Sampling FPS (frames per second) for frame extraction
+        Returns:
+            List of extracted frame paths
+        """
+        image_paths = []
+        if isinstance(input_video, dict) and "name" in input_video:
+            video_path = input_video["name"]
+        else:
+            video_path = input_video
+        vs = cv2.VideoCapture(video_path)
+        fps = vs.get(cv2.CAP_PROP_FPS)
+        frame_interval = max(1, int(fps / s_time_interval))  # Convert FPS to frame interval
+        count = 0
+        video_frame_num = 0
+        while True:
+            gotit, frame = vs.read()
+            if not gotit:
+                break
+            count += 1
+            if count % frame_interval == 0:
+                image_path = os.path.join(target_dir_images, f"{video_frame_num:06}.png")
+                cv2.imwrite(image_path, frame)
+                image_paths.append(image_path)
+                video_frame_num += 1
+        return image_paths
+    def update_gallery_on_upload(
+        self,
+        input_video: Optional[str],
+        input_images: Optional[List],
+        s_time_interval: float = 10.0,
+    ) -> Tuple[Optional[str], Optional[str], Optional[List], Optional[str]]:
+        """
+        Handle file uploads and update gallery.
+        Args:
+            input_video: Path to input video file
+            input_images: List of input image files
+            s_time_interval: Sampling FPS (frames per second) for frame extraction
+        Returns:
+            Tuple of (reconstruction_output, target_dir, image_paths, log_message)
+        """
+        if not input_video and not input_images:
+            return None, None, None, None
+        target_dir, image_paths = self.handle_uploads(input_video, input_images, s_time_interval)
+        return (
+            None,
+            target_dir,
+            image_paths,
+            "Upload complete. Click 'Reconstruct' to begin 3D processing.",
+        )
+    def load_example_scene(
+        self, scene_name: str, examples_dir: str = "examples"
+    ) -> Tuple[Optional[str], Optional[str], Optional[List], str]:
+        """
+        Load a scene from examples directory.
+        Args:
+            scene_name: Name of the scene to load
+            examples_dir: Path to examples directory
+        Returns:
+            Tuple of (reconstruction_output, target_dir, image_paths, log_message)
+        """
+        from depth_anything_3.app.modules.utils import get_scene_info
+        scenes = get_scene_info(examples_dir)
+        # Find the selected scene
+        selected_scene = None
+        for scene in scenes:
+            if scene["name"] == scene_name:
+                selected_scene = scene
+                break
+        if selected_scene is None:
+            return None, None, None, "Scene not found"
+        # Use fixed directory name for examples (not timestamp-based)
+        workspace_dir = os.environ.get("DA3_WORKSPACE_DIR", "gradio_workspace")
+        input_images_dir = os.path.join(workspace_dir, "input_images")
+        if not os.path.exists(input_images_dir):
+            os.makedirs(input_images_dir)
+        # Create a fixed folder name based on scene name
+        target_dir = os.path.join(input_images_dir, f"example_{scene_name}")
+        target_dir_images = os.path.join(target_dir, "images")
+        # Check if already cached (GLB file exists)
+        glb_path = os.path.join(target_dir, "scene.glb")
+        is_cached = os.path.exists(glb_path)
+        # Create directory if it doesn't exist
+        if not os.path.exists(target_dir):
+            os.makedirs(target_dir)
+            os.makedirs(target_dir_images)
+        # Copy images if directory is new or empty
+        if not os.path.exists(target_dir_images) or len(os.listdir(target_dir_images)) == 0:
+            os.makedirs(target_dir_images, exist_ok=True)
+            image_paths = []
+            for file_path in selected_scene["image_files"]:
+                dst_path = os.path.join(target_dir_images, os.path.basename(file_path))
+                shutil.copy(file_path, dst_path)
+                image_paths.append(dst_path)
+        else:
+            # Use existing images
+            image_paths = sorted(
+                [
+                    os.path.join(target_dir_images, f)
+                    for f in os.listdir(target_dir_images)
+                    if f.lower().endswith((".png", ".jpg", ".jpeg", ".bmp", ".tiff", ".tif"))
+                ]
+            )
+        # Return cached GLB if available
+        if is_cached:
+            return (
+                glb_path,  # Return cached reconstruction
+                target_dir,  # Set target directory
+                image_paths,  # Set gallery
+                f"Loaded cached scene '{scene_name}' with {selected_scene['num_images']} images.",
+            )
+        else:
+            return (
+                None,  # No cached reconstruction
+                target_dir,  # Set target directory
+                image_paths,  # Set gallery
+                (
+                    f"Loaded scene '{scene_name}' with {selected_scene['num_images']} images. "
+                    "Click 'Reconstruct' to begin 3D processing."
+                ),
+            )

src/depth_anything_3/app/modules/model_inference.py ADDED Viewed

	@@ -0,0 +1,286 @@

+# Copyright (c) 2025 ByteDance Ltd. and/or its affiliates
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+Model inference module for Depth Anything 3 Gradio app.
+This module handles all model-related operations including inference,
+data processing, and result preparation.
+"""
+import gc
+import glob
+import os
+from typing import Any, Dict, Optional, Tuple
+import numpy as np
+import torch
+from depth_anything_3.api import DepthAnything3
+from depth_anything_3.utils.export.glb import export_to_glb
+from depth_anything_3.utils.export.gs import export_to_gs_video
+class ModelInference:
+    """
+    Handles model inference and data processing for Depth Anything 3.
+    """
+    def __init__(self):
+        """Initialize the model inference handler."""
+        self.model = None
+    def initialize_model(self, device: str = "cuda") -> None:
+        """
+        Initialize the DepthAnything3 model.
+        Args:
+            device: Device to load the model on
+        """
+        if self.model is None:
+            # Get model directory from environment variable or use default
+            model_dir = os.environ.get(
+                "DA3_MODEL_DIR", "/dev/shm/da3_models/DA3HF-VITG-METRIC_VITL"
+            )
+            self.model = DepthAnything3.from_pretrained(model_dir)
+            self.model = self.model.to(device)
+        else:
+            self.model = self.model.to(device)
+        self.model.eval()
+    def run_inference(
+        self,
+        target_dir: str,
+        filter_black_bg: bool = False,
+        filter_white_bg: bool = False,
+        process_res_method: str = "upper_bound_resize",
+        show_camera: bool = True,
+        selected_first_frame: Optional[str] = None,
+        save_percentage: float = 30.0,
+        num_max_points: int = 1_000_000,
+        infer_gs: bool = False,
+        gs_trj_mode: str = "extend",
+        gs_video_quality: str = "high",
+    ) -> Tuple[Any, Dict[int, Dict[str, Any]]]:
+        """
+        Run DepthAnything3 model inference on images.
+        Args:
+            target_dir: Directory containing images
+            apply_mask: Whether to apply mask for ambiguous depth classes
+            mask_edges: Whether to mask edges
+            filter_black_bg: Whether to filter black background
+            filter_white_bg: Whether to filter white background
+            process_res_method: Method for resizing input images
+            show_camera: Whether to show camera in 3D view
+            selected_first_frame: Selected first frame filename
+            save_percentage: Percentage of points to save (0-100)
+            infer_gs: Whether to infer 3D Gaussian Splatting
+        Returns:
+            Tuple of (prediction, processed_data)
+        """
+        print(f"Processing images from {target_dir}")
+        # Device check
+        device = "cuda" if torch.cuda.is_available() else "cpu"
+        device = torch.device(device)
+        # Initialize model if needed
+        self.initialize_model(device)
+        # Get image paths
+        print("Loading images...")
+        image_folder_path = os.path.join(target_dir, "images")
+        all_image_paths = sorted(glob.glob(os.path.join(image_folder_path, "*")))
+        # Filter for image files
+        image_extensions = [".jpg", ".jpeg", ".png", ".bmp", ".tiff", ".tif"]
+        all_image_paths = [
+            path
+            for path in all_image_paths
+            if any(path.lower().endswith(ext) for ext in image_extensions)
+        ]
+        print(f"Found {len(all_image_paths)} images")
+        print(f"All image paths: {all_image_paths}")
+        # Apply first frame selection logic
+        if selected_first_frame:
+            # Find the image with matching filename
+            selected_path = None
+            for path in all_image_paths:
+                if os.path.basename(path) == selected_first_frame:
+                    selected_path = path
+                    break
+            if selected_path:
+                # Move selected frame to the front
+                image_paths = [selected_path] + [
+                    path for path in all_image_paths if path != selected_path
+                ]
+                print(f"User selected first frame: {selected_first_frame} -> {selected_path}")
+                print(f"Reordered image paths: {image_paths}")
+            else:
+                # Use default order if no match found
+                image_paths = all_image_paths
+                print(
+                    f"Selected frame '{selected_first_frame}' not found in image paths. "
+                    "Using default order."
+                )
+                first_frame_display = image_paths[0] if image_paths else "No images"
+                print(f"Using default order (first frame): {first_frame_display}")
+        else:
+            # Use default order (sorted)
+            image_paths = all_image_paths
+            first_frame_display = image_paths[0] if image_paths else "No images"
+            print(f"Using default order (first frame): {first_frame_display}")
+        if len(image_paths) == 0:
+            raise ValueError("No images found. Check your upload.")
+        # Map UI options to actual method names
+        method_mapping = {"high_res": "lower_bound_resize", "low_res": "upper_bound_resize"}
+        actual_method = method_mapping.get(process_res_method, "upper_bound_crop")
+        # Run model inference
+        print(f"Running inference with method: {actual_method}")
+        with torch.no_grad():
+            prediction = self.model.inference(
+                image_paths, export_dir=None, process_res_method=actual_method, infer_gs=infer_gs
+            )
+        # num_max_points: int = 1_000_000,
+        export_to_glb(
+            prediction,
+            filter_black_bg=filter_black_bg,
+            filter_white_bg=filter_white_bg,
+            export_dir=target_dir,
+            show_cameras=show_camera,
+            conf_thresh_percentile=save_percentage,
+            num_max_points=int(num_max_points),
+        )
+        # export to gs video if needed
+        if infer_gs:
+            mode_mapping = {"extend": "extend", "smooth": "interpolate_smooth"}
+            print(f"GS mode: {gs_trj_mode}; Backend mode: {mode_mapping[gs_trj_mode]}")
+            export_to_gs_video(
+                prediction,
+                export_dir=target_dir,
+                chunk_size=4,
+                trj_mode=mode_mapping.get(gs_trj_mode, "extend"),
+                enable_tqdm=True,
+                vis_depth="hcat",
+                video_quality=gs_video_quality,
+            )
+        # Save predictions.npz for caching metric depth data
+        self._save_predictions_cache(target_dir, prediction)
+        # Process results
+        processed_data = self._process_results(target_dir, prediction, image_paths)
+        # Clean up
+        torch.cuda.empty_cache()
+        return prediction, processed_data
+    def _save_predictions_cache(self, target_dir: str, prediction: Any) -> None:
+        """
+        Save predictions data to predictions.npz for caching.
+        Args:
+            target_dir: Directory to save the cache
+            prediction: Model prediction object
+        """
+        try:
+            output_file = os.path.join(target_dir, "predictions.npz")
+            # Build save dict with prediction data
+            save_dict = {}
+            # Save processed images if available
+            if prediction.processed_images is not None:
+                save_dict["images"] = prediction.processed_images
+            # Save depth data
+            if prediction.depth is not None:
+                save_dict["depths"] = np.round(prediction.depth, 6)
+            # Save confidence if available
+            if prediction.conf is not None:
+                save_dict["conf"] = np.round(prediction.conf, 2)
+            # Save camera parameters
+            if prediction.extrinsics is not None:
+                save_dict["extrinsics"] = prediction.extrinsics
+            if prediction.intrinsics is not None:
+                save_dict["intrinsics"] = prediction.intrinsics
+            # Save to file
+            np.savez_compressed(output_file, **save_dict)
+            print(f"Saved predictions cache to: {output_file}")
+        except Exception as e:
+            print(f"Warning: Failed to save predictions cache: {e}")
+    def _process_results(
+        self, target_dir: str, prediction: Any, image_paths: list
+    ) -> Dict[int, Dict[str, Any]]:
+        """
+        Process model results into structured data.
+        Args:
+            target_dir: Directory containing results
+            prediction: Model prediction object
+            image_paths: List of input image paths
+        Returns:
+            Dictionary containing processed data for each view
+        """
+        processed_data = {}
+        # Read generated depth visualization files
+        depth_vis_dir = os.path.join(target_dir, "depth_vis")
+        if os.path.exists(depth_vis_dir):
+            depth_files = sorted(glob.glob(os.path.join(depth_vis_dir, "*.jpg")))
+            for i, depth_file in enumerate(depth_files):
+                # Use processed images directly from API
+                processed_image = None
+                if prediction.processed_images is not None and i < len(
+                    prediction.processed_images
+                ):
+                    processed_image = prediction.processed_images[i]
+                processed_data[i] = {
+                    "depth_image": depth_file,
+                    "image": processed_image,
+                    "original_image_path": image_paths[i] if i < len(image_paths) else None,
+                    "depth": prediction.depth[i] if i < len(prediction.depth) else None,
+                    "intrinsics": (
+                        prediction.intrinsics[i]
+                        if prediction.intrinsics is not None and i < len(prediction.intrinsics)
+                        else None
+                    ),
+                    "mask": None,  # No mask information available
+                }
+        return processed_data
+    def cleanup(self) -> None:
+        """Clean up GPU memory."""
+        if torch.cuda.is_available():
+            torch.cuda.empty_cache()
+        gc.collect()

src/depth_anything_3/app/modules/ui_components.py ADDED Viewed

	@@ -0,0 +1,474 @@

+# Copyright (c) 2025 ByteDance Ltd. and/or its affiliates
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+UI components module for Depth Anything 3 Gradio app.
+This module contains UI component definitions and layout functions.
+"""
+import os
+from typing import Any, Dict, List, Tuple
+import gradio as gr
+from depth_anything_3.app.modules.utils import get_logo_base64, get_scene_info
+class UIComponents:
+    """
+    Handles UI component creation and layout for the Gradio app.
+    """
+    def __init__(self):
+        """Initialize the UI components handler."""
+    def create_upload_section(self) -> Tuple[gr.Video, gr.Slider, gr.File, gr.Gallery, gr.Button]:
+        """
+        Create the upload section with video, images, and gallery components.
+        Returns:
+            A tuple of Gradio components: (input_video, s_time_interval, input_images,
+            image_gallery, select_first_frame_btn).
+        """
+        input_video = gr.Video(label="Upload Video", interactive=True)
+        s_time_interval = gr.Slider(
+            minimum=0.1,
+            maximum=60,
+            value=10,
+            step=0.1,
+            label="Sampling FPS (Frames Per Second)",
+            interactive=True,
+            visible=True,
+        )
+        input_images = gr.File(file_count="multiple", label="Upload Images", interactive=True)
+        image_gallery = gr.Gallery(
+            label="Preview",
+            columns=4,
+            height="300px",
+            show_download_button=True,
+            object_fit="contain",
+            preview=True,
+            interactive=False,
+        )
+        # Select first frame button (moved below image gallery)
+        select_first_frame_btn = gr.Button("Select First Frame", scale=1)
+        return input_video, s_time_interval, input_images, image_gallery, select_first_frame_btn
+    def create_3d_viewer_section(self) -> gr.Model3D:
+        """
+        Create the 3D viewer component.
+        Returns:
+            3D model viewer component
+        """
+        return gr.Model3D(
+            height=520,
+            zoom_speed=0.5,
+            pan_speed=0.5,
+            clear_color=[0.0, 0.0, 0.0, 0.0],
+            key="persistent_3d_viewer",
+            elem_id="reconstruction_3d_viewer",
+        )
+    def create_nvs_video(self) -> Tuple[gr.Video, gr.Markdown]:
+        """
+        Create the 3DGS rendered video display component and info message.
+        Returns:
+            Tuple of (video component, info message component)
+        """
+        with gr.Column():
+            gs_info = gr.Markdown(
+                (
+                    "‼️ **3D Gaussian Splatting rendering is currently DISABLED.** <br><br><br>"
+                    "To render novel views from 3DGS, "
+                    "enable **Infer 3D Gaussian Splatting** below. <br>"
+                    "Next, in **Visualization Options**, "
+                    "*optionally* configure the **rendering trajectory** (default: smooth) "
+                    "and **video quality** (default: low), "
+                    "then click **Reconstruct**."
+                ),
+                visible=True,
+                height=520,
+            )
+            gs_video = gr.Video(
+                height=520,
+                label="3DGS Rendered NVS Video (depth shown for reference only)",
+                interactive=False,
+                visible=False,
+            )
+        return gs_video, gs_info
+    def create_depth_section(self) -> Tuple[gr.Button, gr.Dropdown, gr.Button, gr.Image]:
+        """
+        Create the depth visualization section.
+        Returns:
+            A tuple of (prev_depth_btn, depth_view_selector, next_depth_btn, depth_map)
+        """
+        with gr.Row(elem_classes=["navigation-row"]):
+            prev_depth_btn = gr.Button("◀ Previous", size="sm", scale=1)
+            depth_view_selector = gr.Dropdown(
+                choices=["View 1"],
+                value="View 1",
+                label="Select View",
+                scale=2,
+                interactive=True,
+                allow_custom_value=True,
+            )
+            next_depth_btn = gr.Button("Next ▶", size="sm", scale=1)
+        depth_map = gr.Image(
+            type="numpy",
+            label="Colorized Depth Map",
+            format="png",
+            interactive=False,
+        )
+        return prev_depth_btn, depth_view_selector, next_depth_btn, depth_map
+    def create_measure_section(
+        self,
+    ) -> Tuple[gr.Button, gr.Dropdown, gr.Button, gr.Image, gr.Image, gr.Markdown]:
+        """
+        Create the measurement section.
+        Returns:
+            A tuple of (prev_measure_btn, measure_view_selector, next_measure_btn, measure_image,
+            measure_depth_image, measure_text)
+        """
+        from depth_anything_3.app.css_and_html import MEASURE_INSTRUCTIONS_HTML
+        gr.Markdown(MEASURE_INSTRUCTIONS_HTML)
+        with gr.Row(elem_classes=["navigation-row"]):
+            prev_measure_btn = gr.Button("◀ Previous", size="sm", scale=1)
+            measure_view_selector = gr.Dropdown(
+                choices=["View 1"],
+                value="View 1",
+                label="Select View",
+                scale=2,
+                interactive=True,
+                allow_custom_value=True,
+            )
+            next_measure_btn = gr.Button("Next ▶", size="sm", scale=1)
+        with gr.Row():
+            measure_image = gr.Image(
+                type="numpy",
+                show_label=False,
+                format="webp",
+                interactive=False,
+                sources=[],
+                label="RGB Image",
+                scale=1,
+                height=275,
+            )
+            measure_depth_image = gr.Image(
+                type="numpy",
+                show_label=False,
+                format="webp",
+                interactive=False,
+                sources=[],
+                label="Depth Visualization (Right Half)",
+                scale=1,
+                height=275,
+            )
+        gr.Markdown(
+            "**Note:** Images have been adjusted to model processing size. "
+            "Click two points on the RGB image to measure distance."
+        )
+        measure_text = gr.Markdown("")
+        return (
+            prev_measure_btn,
+            measure_view_selector,
+            next_measure_btn,
+            measure_image,
+            measure_depth_image,
+            measure_text,
+        )
+    def create_inference_control_section(self) -> Tuple[gr.Dropdown, gr.Checkbox]:
+        """
+        Create the inference control section (before inference).
+        Returns:
+            Tuple of (process_res_method_dropdown, infer_gs)
+        """
+        with gr.Row():
+            process_res_method_dropdown = gr.Dropdown(
+                choices=["high_res", "low_res"],
+                value="low_res",
+                label="Image Processing Method",
+                info="low_res for much more images",
+                scale=1,
+            )
+            # Modify line 220, add color class
+            infer_gs = gr.Checkbox(
+                label="Infer 3D Gaussian Splatting",
+                value=False,
+                info=(
+                    'Enable novel view rendering from 3DGS (<i class="fas fa-triangle-exclamation '
+                    'fa-color-red"></i> requires extra processing time)'
+                ),
+                scale=1,
+            )
+        return (process_res_method_dropdown, infer_gs)
+    def create_display_control_section(
+        self,
+    ) -> Tuple[
+        gr.Checkbox,
+        gr.Checkbox,
+        gr.Checkbox,
+        gr.Slider,
+        gr.Slider,
+        gr.Dropdown,
+        gr.Dropdown,
+        gr.Button,
+        gr.ClearButton,
+    ]:
+        """
+        Create the display control section (options for visualization).
+        Returns:
+            Tuple of display control components including buttons
+        """
+        with gr.Column():
+            # 3DGS options at the top
+            with gr.Row():
+                gs_trj_mode = gr.Dropdown(
+                    choices=["smooth", "extend"],
+                    value="smooth",
+                    label=("Rendering trajectory for 3DGS viewpoints (requires n_views ≥ 2)"),
+                    info=("'smooth' for view interpolation; 'extend' for longer trajectory"),
+                    visible=False,  # initially hidden
+                )
+                gs_video_quality = gr.Dropdown(
+                    choices=["low", "medium", "high"],
+                    value="low",
+                    label=("Video quality for 3DGS rendered outputs"),
+                    info=("'low' for faster loading speed; 'high' for better visual quality"),
+                    visible=False,  # initially hidden
+                )
+            # Reconstruct and Clear buttons (before Visualization Options)
+            with gr.Row():
+                submit_btn = gr.Button("Reconstruct", scale=1, variant="primary")
+                clear_btn = gr.ClearButton(scale=1)
+            gr.Markdown("### Visualization Options: (Click Reconstruct to update)")
+            show_cam = gr.Checkbox(label="Show Camera", value=True)
+            filter_black_bg = gr.Checkbox(label="Filter Black Background", value=False)
+            filter_white_bg = gr.Checkbox(label="Filter White Background", value=False)
+            save_percentage = gr.Slider(
+                minimum=0,
+                maximum=100,
+                value=10,
+                step=1,
+                label="Filter Percentage",
+                info="Confidence Threshold (%): Higher values filter more points.",
+            )
+            num_max_points = gr.Slider(
+                minimum=1000,
+                maximum=100000,
+                value=1000,
+                step=1000,
+                label="Max Points (K points)",
+                info="Maximum number of points to export to GLB (in thousands)",
+            )
+        return (
+            show_cam,
+            filter_black_bg,
+            filter_white_bg,
+            save_percentage,
+            num_max_points,
+            gs_trj_mode,
+            gs_video_quality,
+            submit_btn,
+            clear_btn,
+        )
+    def create_control_section(
+        self,
+    ) -> Tuple[
+        gr.Button,
+        gr.ClearButton,
+        gr.Dropdown,
+        gr.Checkbox,
+        gr.Checkbox,
+        gr.Checkbox,
+        gr.Checkbox,
+        gr.Checkbox,
+        gr.Dropdown,
+        gr.Checkbox,
+        gr.Textbox,
+    ]:
+        """
+        Create the control section with buttons and options.
+        Returns:
+            Tuple of control components
+        """
+        with gr.Row():
+            submit_btn = gr.Button("Reconstruct", scale=1, variant="primary")
+            clear_btn = gr.ClearButton(
+                scale=1,
+            )
+        with gr.Row():
+            frame_filter = gr.Dropdown(
+                choices=["All"], value="All", label="Show Points from Frame"
+            )
+            with gr.Column():
+                gr.Markdown("### Visualization Option: (Click Reconstruct to update)")
+                show_cam = gr.Checkbox(label="Show Camera", value=True)
+                show_mesh = gr.Checkbox(label="Show Mesh", value=True)
+                filter_black_bg = gr.Checkbox(label="Filter Black Background", value=False)
+                filter_white_bg = gr.Checkbox(label="Filter White Background", value=False)
+                gr.Markdown("### Reconstruction Options: (updated on next run)")
+                apply_mask_checkbox = gr.Checkbox(
+                    label="Apply mask for predicted ambiguous depth classes & edges",
+                    value=True,
+                )
+                process_res_method_dropdown = gr.Dropdown(
+                    choices=[
+                        "upper_bound_resize",
+                        "upper_bound_crop",
+                        "lower_bound_resize",
+                        "lower_bound_crop",
+                    ],
+                    value="upper_bound_resize",
+                    label="Image Processing Method",
+                    info="Method for resizing input images",
+                )
+                save_to_gallery_checkbox = gr.Checkbox(
+                    label="Save to Gallery",
+                    value=False,
+                    info="Save current reconstruction results to gallery directory",
+                )
+                gallery_name_input = gr.Textbox(
+                    label="Gallery Name",
+                    placeholder="Enter a name for the gallery folder",
+                    value="",
+                    info="Leave empty for auto-generated name with timestamp",
+                )
+        return (
+            submit_btn,
+            clear_btn,
+            frame_filter,
+            show_cam,
+            show_mesh,
+            filter_black_bg,
+            filter_white_bg,
+            apply_mask_checkbox,
+            process_res_method_dropdown,
+            save_to_gallery_checkbox,
+            gallery_name_input,
+        )
+    def create_example_scenes_section(self) -> List[Dict[str, Any]]:
+        """
+        Create the example scenes section.
+        Returns:
+            List of scene information dictionaries
+        """
+        # Get workspace directory from environment variable
+        workspace_dir = os.environ.get("DA3_WORKSPACE_DIR", "gradio_workspace")
+        examples_dir = os.path.join(workspace_dir, "examples")
+        # Get scene information
+        scenes = get_scene_info(examples_dir)
+        return scenes
+    def create_example_scene_grid(self, scenes: List[Dict[str, Any]]) -> List[gr.Image]:
+        """
+        Create the example scene grid.
+        Args:
+            scenes: List of scene information dictionaries
+        Returns:
+            List of scene image components
+        """
+        scene_components = []
+        if scenes:
+            for i in range(0, len(scenes), 4):  # Process 4 scenes per row
+                with gr.Row():
+                    for j in range(4):
+                        scene_idx = i + j
+                        if scene_idx < len(scenes):
+                            scene = scenes[scene_idx]
+                            with gr.Column(scale=1, elem_classes=["clickable-thumbnail"]):
+                                # Clickable thumbnail
+                                scene_img = gr.Image(
+                                    value=scene["thumbnail"],
+                                    height=150,
+                                    interactive=False,
+                                    show_label=False,
+                                    elem_id=f"scene_thumb_{scene['name']}",
+                                    sources=[],
+                                )
+                                scene_components.append(scene_img)
+                                # Scene name and image count as text below thumbnail
+                                gr.Markdown(
+                                    f"**{scene['name']}** \n {scene['num_images']} images",
+                                    elem_classes=["scene-info"],
+                                )
+                        else:
+                            # Empty column to maintain grid structure
+                            with gr.Column(scale=1):
+                                pass
+        return scene_components
+    def create_header_section(self) -> gr.HTML:
+        """
+        Create the header section with logo and title.
+        Returns:
+            Header HTML component
+        """
+        from depth_anything_3.app.css_and_html import get_header_html
+        return gr.HTML(get_header_html(get_logo_base64()))
+    def create_description_section(self) -> gr.HTML:
+        """
+        Create the description section.
+        Returns:
+            Description HTML component
+        """
+        from depth_anything_3.app.css_and_html import get_description_html
+        return gr.HTML(get_description_html())
+    def create_acknowledgements_section(self) -> gr.HTML:
+        """
+        Create the acknowledgements section.
+        Returns:
+            Acknowledgements HTML component
+        """
+        from depth_anything_3.app.css_and_html import get_acknowledgements_html
+        return gr.HTML(get_acknowledgements_html())

src/depth_anything_3/app/modules/utils.py ADDED Viewed

	@@ -0,0 +1,211 @@

+# Copyright (c) 2025 ByteDance Ltd. and/or its affiliates
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+Utility functions for Depth Anything 3 Gradio app.
+This module contains helper functions for data processing, visualization,
+and file operations.
+"""
+import gc
+import json
+import os
+import shutil
+from datetime import datetime
+from typing import Any, Dict, List, Optional, Tuple
+import numpy as np
+import torch
+def create_depth_visualization(depth: np.ndarray) -> Optional[np.ndarray]:
+    """
+    Create a colored depth visualization.
+    Args:
+        depth: Depth array
+    Returns:
+        Colored depth visualization or None
+    """
+    if depth is None:
+        return None
+    # Normalize depth to 0-1 range
+    depth_min = depth[depth > 0].min() if (depth > 0).any() else 0
+    depth_max = depth.max()
+    if depth_max <= depth_min:
+        return None
+    # Normalize depth
+    depth_norm = (depth - depth_min) / (depth_max - depth_min)
+    depth_norm = np.clip(depth_norm, 0, 1)
+    # Apply colormap (using matplotlib's viridis colormap)
+    import matplotlib.cm as cm
+    # Convert to colored image
+    depth_colored = cm.viridis(depth_norm)[:, :, :3]  # Remove alpha channel
+    depth_colored = (depth_colored * 255).astype(np.uint8)
+    return depth_colored
+def save_to_gallery_func(
+    target_dir: str, processed_data: Dict[int, Dict[str, Any]], gallery_name: Optional[str] = None
+) -> Tuple[bool, str]:
+    """
+    Save the current reconstruction results to the gallery directory.
+    Args:
+        target_dir: Source directory containing reconstruction results
+        processed_data: Processed data dictionary
+        gallery_name: Name for the gallery folder
+    Returns:
+        Tuple of (success, message)
+    """
+    try:
+        # Get gallery directory from environment variable or use default
+        gallery_dir = os.environ.get(
+            "DA3_GALLERY_DIR",
+            "workspace/gallery",
+        )
+        if not os.path.exists(gallery_dir):
+            os.makedirs(gallery_dir)
+        # Use provided name or create a unique name
+        if gallery_name is None or gallery_name.strip() == "":
+            timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+            gallery_name = f"reconstruction_{timestamp}"
+        gallery_path = os.path.join(gallery_dir, gallery_name)
+        # Check if directory already exists
+        if os.path.exists(gallery_path):
+            return False, f"Save failed: folder '{gallery_name}' already exists"
+        # Create the gallery directory
+        os.makedirs(gallery_path, exist_ok=True)
+        # Copy GLB file
+        glb_source = os.path.join(target_dir, "scene.glb")
+        glb_dest = os.path.join(gallery_path, "scene.glb")
+        if os.path.exists(glb_source):
+            shutil.copy2(glb_source, glb_dest)
+        # Copy depth visualization images
+        depth_vis_dir = os.path.join(target_dir, "depth_vis")
+        if os.path.exists(depth_vis_dir):
+            gallery_depth_vis = os.path.join(gallery_path, "depth_vis")
+            shutil.copytree(depth_vis_dir, gallery_depth_vis)
+        # Copy original images
+        images_source = os.path.join(target_dir, "images")
+        if os.path.exists(images_source):
+            gallery_images = os.path.join(gallery_path, "images")
+            shutil.copytree(images_source, gallery_images)
+        scene_preview_source = os.path.join(target_dir, "scene.jpg")
+        scene_preview_dest = os.path.join(gallery_path, "scene.jpg")
+        shutil.copy2(scene_preview_source, scene_preview_dest)
+        # Save metadata
+        metadata = {
+            "timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
+            "num_images": len(processed_data) if processed_data else 0,
+            "gallery_name": gallery_name,
+        }
+        with open(os.path.join(gallery_path, "metadata.json"), "w") as f:
+            json.dump(metadata, f, indent=2)
+        print(f"Saved reconstruction to gallery: {gallery_path}")
+        return True, f"Save successful: saved to {gallery_path}"
+    except Exception as e:
+        print(f"Error saving to gallery: {e}")
+        return False, f"Save failed: {str(e)}"
+def get_scene_info(examples_dir: str) -> List[Dict[str, Any]]:
+    """
+    Get information about scenes in the examples directory.
+    Args:
+        examples_dir: Path to examples directory
+    Returns:
+        List of scene information dictionaries
+    """
+    import glob
+    scenes = []
+    if not os.path.exists(examples_dir):
+        return scenes
+    for scene_folder in sorted(os.listdir(examples_dir)):
+        scene_path = os.path.join(examples_dir, scene_folder)
+        if os.path.isdir(scene_path):
+            # Find all image files in the scene folder
+            image_extensions = ["*.jpg", "*.jpeg", "*.png", "*.bmp", "*.tiff", "*.tif"]
+            image_files = []
+            for ext in image_extensions:
+                image_files.extend(glob.glob(os.path.join(scene_path, ext)))
+                image_files.extend(glob.glob(os.path.join(scene_path, ext.upper())))
+            if image_files:
+                # Sort images and get the first one for thumbnail
+                image_files = sorted(image_files)
+                first_image = image_files[0]
+                num_images = len(image_files)
+                scenes.append(
+                    {
+                        "name": scene_folder,
+                        "path": scene_path,
+                        "thumbnail": first_image,
+                        "num_images": num_images,
+                        "image_files": image_files,
+                    }
+                )
+    return scenes
+def cleanup_memory() -> None:
+    """Clean up GPU memory and garbage collect."""
+    gc.collect()
+    if torch.cuda.is_available():
+        torch.cuda.empty_cache()
+def get_logo_base64() -> Optional[str]:
+    """
+    Convert WAI logo to base64 for embedding in HTML.
+    Returns:
+        Base64 encoded logo string or None
+    """
+    import base64
+    logo_path = "examples/WAI-Logo/wai_logo.png"
+    try:
+        with open(logo_path, "rb") as img_file:
+            img_data = img_file.read()
+            base64_str = base64.b64encode(img_data).decode()
+            return f"data:image/png;base64,{base64_str}"
+    except FileNotFoundError:
+        return None

src/depth_anything_3/app/modules/visualization.py ADDED Viewed

	@@ -0,0 +1,434 @@

+# Copyright (c) 2025 ByteDance Ltd. and/or its affiliates
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+Visualization module for Depth Anything 3 Gradio app.
+This module handles visualization updates, navigation, and measurement functionality.
+"""
+import os
+from typing import Any, Dict, List, Optional, Tuple
+import cv2
+import gradio as gr
+import numpy as np
+class VisualizationHandler:
+    """
+    Handles visualization updates and navigation for the Gradio app.
+    """
+    def __init__(self):
+        """Initialize the visualization handler."""
+    def update_view_selectors(
+        self, processed_data: Optional[Dict[int, Dict[str, Any]]]
+    ) -> Tuple[gr.Dropdown, gr.Dropdown]:
+        """
+        Update view selector dropdowns based on available views.
+        Args:
+            processed_data: Processed data dictionary
+        Returns:
+            Tuple of (depth_view_selector, measure_view_selector)
+        """
+        if processed_data is None or len(processed_data) == 0:
+            choices = ["View 1"]
+        else:
+            num_views = len(processed_data)
+            choices = [f"View {i + 1}" for i in range(num_views)]
+        return (
+            gr.Dropdown(choices=choices, value=choices[0]),  # depth_view_selector
+            gr.Dropdown(choices=choices, value=choices[0]),  # measure_view_selector
+        )
+    def get_view_data_by_index(
+        self, processed_data: Optional[Dict[int, Dict[str, Any]]], view_index: int
+    ) -> Optional[Dict[str, Any]]:
+        """
+        Get view data by index, handling bounds.
+        Args:
+            processed_data: Processed data dictionary
+            view_index: Index of the view to get
+        Returns:
+            View data dictionary or None
+        """
+        if processed_data is None or len(processed_data) == 0:
+            return None
+        view_keys = list(processed_data.keys())
+        if view_index < 0 or view_index >= len(view_keys):
+            view_index = 0
+        return processed_data[view_keys[view_index]]
+    def update_depth_view(
+        self, processed_data: Optional[Dict[int, Dict[str, Any]]], view_index: int
+    ) -> Optional[str]:
+        """
+        Update depth view for a specific view index.
+        Args:
+            processed_data: Processed data dictionary
+            view_index: Index of the view to update
+        Returns:
+            Path to depth visualization image or None
+        """
+        view_data = self.get_view_data_by_index(processed_data, view_index)
+        if view_data is None or view_data.get("depth_image") is None:
+            return None
+        # Return the depth visualization image directly
+        return view_data["depth_image"]
+    def navigate_depth_view(
+        self,
+        processed_data: Optional[Dict[int, Dict[str, Any]]],
+        current_selector_value: str,
+        direction: int,
+    ) -> Tuple[str, Optional[str]]:
+        """
+        Navigate depth view (direction: -1 for previous, +1 for next).
+        Args:
+            processed_data: Processed data dictionary
+            current_selector_value: Current selector value
+            direction: Direction to navigate (-1 for previous, +1 for next)
+        Returns:
+            Tuple of (new_selector_value, depth_vis)
+        """
+        if processed_data is None or len(processed_data) == 0:
+            return "View 1", None
+        # Parse current view number
+        try:
+            current_view = int(current_selector_value.split()[1]) - 1
+        except:  # noqa
+            current_view = 0
+        num_views = len(processed_data)
+        new_view = (current_view + direction) % num_views
+        new_selector_value = f"View {new_view + 1}"
+        depth_vis = self.update_depth_view(processed_data, new_view)
+        return new_selector_value, depth_vis
+    def update_measure_view(
+        self, processed_data: Optional[Dict[int, Dict[str, Any]]], view_index: int
+    ) -> Tuple[Optional[np.ndarray], Optional[np.ndarray], List]:
+        """
+        Update measure view for a specific view index.
+        Args:
+            processed_data: Processed data dictionary
+            view_index: Index of the view to update
+        Returns:
+            Tuple of (measure_image, depth_right_half, measure_points)
+        """
+        view_data = self.get_view_data_by_index(processed_data, view_index)
+        if view_data is None:
+            return None, None, []  # image, depth_right_half, measure_points
+        # Get the processed (resized) image
+        if "image" in view_data and view_data["image"] is not None:
+            image = view_data["image"].copy()
+        else:
+            return None, None, []
+        # Ensure image is in uint8 format
+        if image.dtype != np.uint8:
+            if image.max() <= 1.0:
+                image = (image * 255).astype(np.uint8)
+            else:
+                image = image.astype(np.uint8)
+        # Extract right half of the depth visualization (pure depth part)
+        depth_image_path = view_data.get("depth_image", None)
+        depth_right_half = None
+        if depth_image_path and os.path.exists(depth_image_path):
+            try:
+                # Load the combined depth visualization image
+                depth_combined = cv2.imread(depth_image_path)
+                depth_combined = cv2.cvtColor(depth_combined, cv2.COLOR_BGR2RGB)
+                if depth_combined is not None:
+                    height, width = depth_combined.shape[:2]
+                    # Extract right half (depth visualization part)
+                    depth_right_half = depth_combined[:, width // 2 :]
+            except Exception as e:
+                print(f"Error extracting depth right half: {e}")
+        return image, depth_right_half, []
+    def navigate_measure_view(
+        self,
+        processed_data: Optional[Dict[int, Dict[str, Any]]],
+        current_selector_value: str,
+        direction: int,
+    ) -> Tuple[str, Optional[np.ndarray], Optional[str], List]:
+        """
+        Navigate measure view (direction: -1 for previous, +1 for next).
+        Args:
+            processed_data: Processed data dictionary
+            current_selector_value: Current selector value
+            direction: Direction to navigate (-1 for previous, +1 for next)
+        Returns:
+            Tuple of (new_selector_value, measure_image, depth_image_path, measure_points)
+        """
+        if processed_data is None or len(processed_data) == 0:
+            return "View 1", None, None, []
+        # Parse current view number
+        try:
+            current_view = int(current_selector_value.split()[1]) - 1
+        except:  # noqa
+            current_view = 0
+        num_views = len(processed_data)
+        new_view = (current_view + direction) % num_views
+        new_selector_value = f"View {new_view + 1}"
+        measure_image, depth_right_half, measure_points = self.update_measure_view(
+            processed_data, new_view
+        )
+        return new_selector_value, measure_image, depth_right_half, measure_points
+    def populate_visualization_tabs(
+        self, processed_data: Optional[Dict[int, Dict[str, Any]]]
+    ) -> Tuple[Optional[str], Optional[np.ndarray], Optional[str], List]:
+        """
+        Populate the depth and measure tabs with processed data.
+        Args:
+            processed_data: Processed data dictionary
+        Returns:
+            Tuple of (depth_vis, measure_img, depth_image_path, measure_points)
+        """
+        if processed_data is None or len(processed_data) == 0:
+            return None, None, None, []
+        # Use update function to get depth visualization
+        depth_vis = self.update_depth_view(processed_data, 0)
+        measure_img, depth_right_half, _ = self.update_measure_view(processed_data, 0)
+        return depth_vis, measure_img, depth_right_half, []
+    def reset_measure(
+        self, processed_data: Optional[Dict[int, Dict[str, Any]]]
+    ) -> Tuple[Optional[np.ndarray], List, str]:
+        """
+        Reset measure points.
+        Args:
+            processed_data: Processed data dictionary
+        Returns:
+            Tuple of (image, measure_points, text)
+        """
+        if processed_data is None or len(processed_data) == 0:
+            return None, [], ""
+        # Return the first view image
+        first_view = list(processed_data.values())[0]
+        return first_view["image"], [], ""
+    def measure(
+        self,
+        processed_data: Optional[Dict[int, Dict[str, Any]]],
+        measure_points: List,
+        current_view_selector: str,
+        event: gr.SelectData,
+    ) -> List:
+        """
+        Handle measurement on images.
+        Args:
+            processed_data: Processed data dictionary
+            measure_points: List of current measure points
+            current_view_selector: Current view selector value
+            event: Gradio select event
+        Returns:
+            List of [image, depth_right_half, measure_points, text]
+        """
+        try:
+            print(f"Measure function called with selector: {current_view_selector}")
+            if processed_data is None or len(processed_data) == 0:
+                return [None, [], "No data available"]
+            # Use the currently selected view instead of always using the first view
+            try:
+                current_view_index = int(current_view_selector.split()[1]) - 1
+            except:  # noqa
+                current_view_index = 0
+            print(f"Using view index: {current_view_index}")
+            # Get view data safely
+            if current_view_index < 0 or current_view_index >= len(processed_data):
+                current_view_index = 0
+            view_keys = list(processed_data.keys())
+            current_view = processed_data[view_keys[current_view_index]]
+            if current_view is None:
+                return [None, [], "No view data available"]
+            point2d = event.index[0], event.index[1]
+            print(f"Clicked point: {point2d}")
+            measure_points.append(point2d)
+            # Get image and depth visualization
+            image, depth_right_half, _ = self.update_measure_view(
+                processed_data, current_view_index
+            )
+            if image is None:
+                return [None, [], "No image available"]
+            image = image.copy()
+            # Ensure image is in uint8 format for proper cv2 operations
+            try:
+                if image.dtype != np.uint8:
+                    if image.max() <= 1.0:
+                        # Image is in [0, 1] range, convert to [0, 255]
+                        image = (image * 255).astype(np.uint8)
+                    else:
+                        # Image is already in [0, 255] range
+                        image = image.astype(np.uint8)
+            except Exception as e:
+                print(f"Image conversion error: {e}")
+                return [None, [], f"Image conversion error: {e}"]
+            # Draw circles for points
+            try:
+                for p in measure_points:
+                    if 0 <= p[0] < image.shape[1] and 0 <= p[1] < image.shape[0]:
+                        image = cv2.circle(image, p, radius=5, color=(255, 0, 0), thickness=2)
+            except Exception as e:
+                print(f"Drawing error: {e}")
+                return [None, [], f"Drawing error: {e}"]
+            # Get depth information from processed_data
+            depth_text = ""
+            try:
+                for i, p in enumerate(measure_points):
+                    if (
+                        current_view["depth"] is not None
+                        and 0 <= p[1] < current_view["depth"].shape[0]
+                        and 0 <= p[0] < current_view["depth"].shape[1]
+                    ):
+                        d = current_view["depth"][p[1], p[0]]
+                        depth_text += f"- **P{i + 1} depth: {d:.2f}m**\n"
+                    else:
+                        depth_text += f"- **P{i + 1}: Click position ({p[0]}, {p[1]}) - No depth information**\n"  # noqa: E501
+            except Exception as e:
+                print(f"Depth text error: {e}")
+                depth_text = f"Error computing depth: {e}\n"
+            if len(measure_points) == 2:
+                try:
+                    point1, point2 = measure_points
+                    # Draw line
+                    if (
+                        0 <= point1[0] < image.shape[1]
+                        and 0 <= point1[1] < image.shape[0]
+                        and 0 <= point2[0] < image.shape[1]
+                        and 0 <= point2[1] < image.shape[0]
+                    ):
+                        image = cv2.line(image, point1, point2, color=(255, 0, 0), thickness=2)
+                    # Compute 3D distance using depth information and camera intrinsics
+                    distance_text = "- **Distance: Unable to calculate 3D distance**"
+                    if (
+                        current_view["depth"] is not None
+                        and 0 <= point1[1] < current_view["depth"].shape[0]
+                        and 0 <= point1[0] < current_view["depth"].shape[1]
+                        and 0 <= point2[1] < current_view["depth"].shape[0]
+                        and 0 <= point2[0] < current_view["depth"].shape[1]
+                    ):
+                        try:
+                            # Get depth values at the two points
+                            d1 = current_view["depth"][point1[1], point1[0]]
+                            d2 = current_view["depth"][point2[1], point2[0]]
+                            # Convert 2D pixel coordinates to 3D world coordinates
+                            if current_view["intrinsics"] is not None:
+                                # Get camera intrinsics
+                                K = current_view["intrinsics"]  # 3x3 intrinsic matrix
+                                fx, fy = K[0, 0], K[1, 1]  # focal lengths
+                                cx, cy = K[0, 2], K[1, 2]  # principal point
+                                # Convert pixel coordinates to normalized camera coordinates
+                                # Point 1: (u1, v1) -> (x1, y1, z1)
+                                u1, v1 = point1[0], point1[1]
+                                x1 = (u1 - cx) * d1 / fx
+                                y1 = (v1 - cy) * d1 / fy
+                                z1 = d1
+                                # Point 2: (u2, v2) -> (x2, y2, z2)
+                                u2, v2 = point2[0], point2[1]
+                                x2 = (u2 - cx) * d2 / fx
+                                y2 = (v2 - cy) * d2 / fy
+                                z2 = d2
+                                # Calculate 3D Euclidean distance
+                                p1_3d = np.array([x1, y1, z1])
+                                p2_3d = np.array([x2, y2, z2])
+                                distance_3d = np.linalg.norm(p1_3d - p2_3d)
+                                distance_text = f"- **Distance: {distance_3d:.2f}m**"
+                            else:
+                                # Fallback to simplified calculation if no intrinsics
+                                pixel_distance = np.sqrt(
+                                    (point1[0] - point2[0]) ** 2 + (point1[1] - point2[1]) ** 2
+                                )
+                                avg_depth = (d1 + d2) / 2
+                                scale_factor = avg_depth / 1000  # Rough scaling factor
+                                estimated_3d_distance = pixel_distance * scale_factor
+                                distance_text = f"- **Distance: {estimated_3d_distance:.2f}m (estimated, no intrinsics)**"  # noqa: E501
+                        except Exception as e:
+                            print(f"Distance computation error: {e}")
+                            distance_text = f"- **Distance computation error: {e}**"
+                    measure_points = []
+                    text = depth_text + distance_text
+                    print(f"Measurement complete: {text}")
+                    return [image, depth_right_half, measure_points, text]
+                except Exception as e:
+                    print(f"Final measurement error: {e}")
+                    return [None, [], f"Measurement error: {e}"]
+            else:
+                print(f"Single point measurement: {depth_text}")
+                return [image, depth_right_half, measure_points, depth_text]
+        except Exception as e:
+            print(f"Overall measure function error: {e}")
+            return [None, [], f"Measure function error: {e}"]

src/depth_anything_3/cfg.py ADDED Viewed

	@@ -0,0 +1,144 @@

+# Copyright (c) 2025 ByteDance Ltd. and/or its affiliates
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+Configuration utility functions
+"""
+import importlib
+from pathlib import Path
+from typing import Any, Callable, List, Union
+from omegaconf import DictConfig, ListConfig, OmegaConf
+try:
+    OmegaConf.register_new_resolver("eval", eval)
+except Exception as e:
+    # if eval is not available, we can just pass
+    print(f"Error registering eval resolver: {e}")
+def load_config(path: str, argv: List[str] = None) -> Union[DictConfig, ListConfig]:
+    """
+    Load a configuration. Will resolve inheritance.
+    Supports both file paths and module paths (e.g., depth_anything_3.configs.giant).
+    """
+    # Check if path is a module path (contains dots but no slashes and doesn't end with .yaml)
+    if "." in path and "/" not in path and not path.endswith(".yaml"):
+        # It's a module path, load from package resources
+        path_parts = path.split(".")[1:]
+        config_path = Path(__file__).resolve().parent
+        for part in path_parts:
+            config_path = config_path.joinpath(part)
+        config_path = config_path.with_suffix(".yaml")
+        config = OmegaConf.load(str(config_path))
+    else:
+        # It's a file path (absolute, relative, or with .yaml extension)
+        config = OmegaConf.load(path)
+    if argv is not None:
+        config_argv = OmegaConf.from_dotlist(argv)
+        config = OmegaConf.merge(config, config_argv)
+    config = resolve_recursive(config, resolve_inheritance)
+    return config
+def resolve_recursive(
+    config: Any,
+    resolver: Callable[[Union[DictConfig, ListConfig]], Union[DictConfig, ListConfig]],
+) -> Any:
+    config = resolver(config)
+    if isinstance(config, DictConfig):
+        for k in config.keys():
+            v = config.get(k)
+            if isinstance(v, (DictConfig, ListConfig)):
+                config[k] = resolve_recursive(v, resolver)
+    if isinstance(config, ListConfig):
+        for i in range(len(config)):
+            v = config.get(i)
+            if isinstance(v, (DictConfig, ListConfig)):
+                config[i] = resolve_recursive(v, resolver)
+    return config
+def resolve_inheritance(config: Union[DictConfig, ListConfig]) -> Any:
+    """
+    Recursively resolve inheritance if the config contains:
+    __inherit__: path/to/parent.yaml or a ListConfig of such paths.
+    """
+    if isinstance(config, DictConfig):
+        inherit = config.pop("__inherit__", None)
+        if inherit:
+            inherit_list = inherit if isinstance(inherit, ListConfig) else [inherit]
+            parent_config = None
+            for parent_path in inherit_list:
+                assert isinstance(parent_path, str)
+                parent_config = (
+                    load_config(parent_path)
+                    if parent_config is None
+                    else OmegaConf.merge(parent_config, load_config(parent_path))
+                )
+            if len(config.keys()) > 0:
+                config = OmegaConf.merge(parent_config, config)
+            else:
+                config = parent_config
+    return config
+def import_item(path: str, name: str) -> Any:
+    """
+    Import a python item. Example: import_item("path.to.file", "MyClass") -> MyClass
+    """
+    return getattr(importlib.import_module(path), name)
+def create_object(config: DictConfig) -> Any:
+    """
+    Create an object from config.
+    The config is expected to contains the following:
+    __object__:
+      path: path.to.module
+      name: MyClass
+      args: as_config | as_params (default to as_config)
+    """
+    config = DictConfig(config)
+    item = import_item(
+        path=config.__object__.path,
+        name=config.__object__.name,
+    )
+    args = config.__object__.get("args", "as_config")
+    if args == "as_config":
+        return item(config)
+    if args == "as_params":
+        config = OmegaConf.to_object(config)
+        config.pop("__object__")
+        return item(**config)
+    raise NotImplementedError(f"Unknown args type: {args}")
+def create_dataset(path: str, *args, **kwargs) -> Any:
+    """
+    Create a dataset. Requires the file to contain a "create_dataset" function.
+    """
+    return import_item(path, "create_dataset")(*args, **kwargs)
+def to_dict_recursive(config_obj):
+    if isinstance(config_obj, DictConfig):
+        return {k: to_dict_recursive(v) for k, v in config_obj.items()}
+    elif isinstance(config_obj, ListConfig):
+        return [to_dict_recursive(item) for item in config_obj]
+    return config_obj

src/depth_anything_3/cli.py ADDED Viewed

	@@ -0,0 +1,742 @@

+# flake8: noqa: E402
+# Copyright (c) 2025 ByteDance Ltd. and/or its affiliates
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+Refactored Depth Anything 3 CLI
+Clean, modular command-line interface
+"""
+from __future__ import annotations
+import os
+import typer
+from depth_anything_3.services import start_server
+from depth_anything_3.services.gallery import gallery as gallery_main
+from depth_anything_3.services.inference_service import run_inference
+from depth_anything_3.services.input_handlers import (
+    ColmapHandler,
+    ImageHandler,
+    ImagesHandler,
+    InputHandler,
+    VideoHandler,
+    parse_export_feat,
+)
+from depth_anything_3.utils.constants import DEFAULT_EXPORT_DIR, DEFAULT_GALLERY_DIR, DEFAULT_GRADIO_DIR, DEFAULT_MODEL
+os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"
+app = typer.Typer(help="Depth Anything 3 - Video depth estimation CLI", add_completion=False)
+# ============================================================================
+# Input type detection utilities
+# ============================================================================
+# Supported file extensions
+IMAGE_EXTENSIONS = {".png", ".jpg", ".jpeg", ".webp", ".bmp", ".tiff", ".tif"}
+VIDEO_EXTENSIONS = {".mp4", ".avi", ".mov", ".mkv", ".flv", ".wmv", ".webm", ".m4v"}
+def detect_input_type(input_path: str) -> str:
+    """
+    Detect input type from path.
+    Returns:
+        - "image": Single image file
+        - "images": Directory containing images
+        - "video": Video file
+        - "colmap": COLMAP directory structure
+        - "unknown": Cannot determine type
+    """
+    if not os.path.exists(input_path):
+        return "unknown"
+    # Check if it's a file
+    if os.path.isfile(input_path):
+        ext = os.path.splitext(input_path)[1].lower()
+        if ext in IMAGE_EXTENSIONS:
+            return "image"
+        elif ext in VIDEO_EXTENSIONS:
+            return "video"
+        return "unknown"
+    # Check if it's a directory
+    if os.path.isdir(input_path):
+        # Check for COLMAP structure
+        images_dir = os.path.join(input_path, "images")
+        sparse_dir = os.path.join(input_path, "sparse")
+        if os.path.isdir(images_dir) and os.path.isdir(sparse_dir):
+            return "colmap"
+        # Check if directory contains image files
+        for item in os.listdir(input_path):
+            item_path = os.path.join(input_path, item)
+            if os.path.isfile(item_path):
+                ext = os.path.splitext(item)[1].lower()
+                if ext in IMAGE_EXTENSIONS:
+                    return "images"
+        return "unknown"
+    return "unknown"
+# ============================================================================
+# Common parameters and configuration
+# ============================================================================
+# ============================================================================
+# Inference commands
+# ============================================================================
+@app.command()
+def auto(
+    input_path: str = typer.Argument(
+        ..., help="Path to input (image, directory, video, or COLMAP)"
+    ),
+    model_dir: str = typer.Option(DEFAULT_MODEL, help="Model directory path"),
+    export_dir: str = typer.Option(DEFAULT_EXPORT_DIR, help="Export directory"),
+    export_format: str = typer.Option("glb", help="Export format"),
+    device: str = typer.Option("cuda", help="Device to use"),
+    use_backend: bool = typer.Option(False, help="Use backend service for inference"),
+    backend_url: str = typer.Option(
+        "http://localhost:8008", help="Backend URL (default: http://localhost:8008)"
+    ),
+    process_res: int = typer.Option(504, help="Processing resolution"),
+    process_res_method: str = typer.Option(
+        "upper_bound_resize", help="Processing resolution method"
+    ),
+    export_feat: str = typer.Option(
+        "",
+        help="[FEAT_VIS]Export features from specified layers using comma-separated indices (e.g., '0,1,2').",
+    ),
+    auto_cleanup: bool = typer.Option(
+        False, help="Automatically clean export directory if it exists (no prompt)"
+    ),
+    # Video-specific options
+    fps: float = typer.Option(1.0, help="[Video] Sampling FPS for frame extraction"),
+    # COLMAP-specific options
+    sparse_subdir: str = typer.Option(
+        "", help="[COLMAP] Sparse reconstruction subdirectory (e.g., '0' for sparse/0/)"
+    ),
+    align_to_input_ext_scale: bool = typer.Option(
+        True, help="[COLMAP] Align prediction to input extrinsics scale"
+    ),
+    # GLB export options
+    conf_thresh_percentile: float = typer.Option(
+        40.0, help="[GLB] Lower percentile for adaptive confidence threshold"
+    ),
+    num_max_points: int = typer.Option(
+        1_000_000, help="[GLB] Maximum number of points in the point cloud"
+    ),
+    show_cameras: bool = typer.Option(
+        True, help="[GLB] Show camera wireframes in the exported scene"
+    ),
+    # Feat_vis export options
+    feat_vis_fps: int = typer.Option(15, help="[FEAT_VIS] Frame rate for output video"),
+):
+    """
+    Automatically detect input type and run appropriate processing.
+    Supports:
+    - Single image file (.jpg, .png, etc.)
+    - Directory of images
+    - Video file (.mp4, .avi, etc.)
+    - COLMAP directory (with 'images' and 'sparse' subdirectories)
+    """
+    # Detect input type
+    input_type = detect_input_type(input_path)
+    if input_type == "unknown":
+        typer.echo(f"❌ Error: Cannot determine input type for: {input_path}", err=True)
+        typer.echo("Supported inputs:", err=True)
+        typer.echo("  - Single image file (.jpg, .png, etc.)", err=True)
+        typer.echo("  - Directory containing images", err=True)
+        typer.echo("  - Video file (.mp4, .avi, etc.)", err=True)
+        typer.echo("  - COLMAP directory (with 'images/' and 'sparse/' subdirectories)", err=True)
+        raise typer.Exit(1)
+    # Display detected type
+    typer.echo(f"🔍 Detected input type: {input_type.upper()}")
+    typer.echo(f"📁 Input path: {input_path}")
+    typer.echo()
+    # Determine backend URL based on use_backend flag
+    final_backend_url = backend_url if use_backend else None
+    # Parse export_feat parameter
+    export_feat_layers = parse_export_feat(export_feat)
+    # Route to appropriate handler
+    if input_type == "image":
+        typer.echo("Processing single image...")
+        # Process input
+        image_files = ImageHandler.process(input_path)
+        # Handle export directory
+        export_dir = InputHandler.handle_export_dir(export_dir, auto_cleanup)
+        # Run inference
+        run_inference(
+            image_paths=image_files,
+            export_dir=export_dir,
+            model_dir=model_dir,
+            device=device,
+            backend_url=final_backend_url,
+            export_format=export_format,
+            process_res=process_res,
+            process_res_method=process_res_method,
+            export_feat_layers=export_feat_layers,
+            conf_thresh_percentile=conf_thresh_percentile,
+            num_max_points=num_max_points,
+            show_cameras=show_cameras,
+            feat_vis_fps=feat_vis_fps,
+        )
+    elif input_type == "images":
+        typer.echo("Processing directory of images...")
+        # Process input - use default extensions
+        image_files = ImagesHandler.process(input_path, "png,jpg,jpeg")
+        # Handle export directory
+        export_dir = InputHandler.handle_export_dir(export_dir, auto_cleanup)
+        # Run inference
+        run_inference(
+            image_paths=image_files,
+            export_dir=export_dir,
+            model_dir=model_dir,
+            device=device,
+            backend_url=final_backend_url,
+            export_format=export_format,
+            process_res=process_res,
+            process_res_method=process_res_method,
+            export_feat_layers=export_feat_layers,
+            conf_thresh_percentile=conf_thresh_percentile,
+            num_max_points=num_max_points,
+            show_cameras=show_cameras,
+            feat_vis_fps=feat_vis_fps,
+        )
+    elif input_type == "video":
+        typer.echo(f"Processing video with FPS={fps}...")
+        # Handle export directory
+        export_dir = InputHandler.handle_export_dir(export_dir, auto_cleanup)
+        # Process input
+        image_files = VideoHandler.process(input_path, export_dir, fps)
+        # Run inference
+        run_inference(
+            image_paths=image_files,
+            export_dir=export_dir,
+            model_dir=model_dir,
+            device=device,
+            backend_url=final_backend_url,
+            export_format=export_format,
+            process_res=process_res,
+            process_res_method=process_res_method,
+            export_feat_layers=export_feat_layers,
+            conf_thresh_percentile=conf_thresh_percentile,
+            num_max_points=num_max_points,
+            show_cameras=show_cameras,
+            feat_vis_fps=feat_vis_fps,
+        )
+    elif input_type == "colmap":
+        typer.echo(
+            f"Processing COLMAP directory (sparse subdirectory: '{sparse_subdir or 'default'}')..."
+        )
+        # Process input
+        image_files, extrinsics, intrinsics = ColmapHandler.process(input_path, sparse_subdir)
+        # Handle export directory
+        export_dir = InputHandler.handle_export_dir(export_dir, auto_cleanup)
+        # Run inference
+        run_inference(
+            image_paths=image_files,
+            export_dir=export_dir,
+            model_dir=model_dir,
+            device=device,
+            backend_url=final_backend_url,
+            export_format=export_format,
+            process_res=process_res,
+            process_res_method=process_res_method,
+            export_feat_layers=export_feat_layers,
+            extrinsics=extrinsics,
+            intrinsics=intrinsics,
+            align_to_input_ext_scale=align_to_input_ext_scale,
+            conf_thresh_percentile=conf_thresh_percentile,
+            num_max_points=num_max_points,
+            show_cameras=show_cameras,
+            feat_vis_fps=feat_vis_fps,
+        )
+    typer.echo()
+    typer.echo("✅ Processing completed successfully!")
+@app.command()
+def image(
+    image_path: str = typer.Argument(..., help="Path to input image file"),
+    model_dir: str = typer.Option(DEFAULT_MODEL, help="Model directory path"),
+    export_dir: str = typer.Option(DEFAULT_EXPORT_DIR, help="Export directory"),
+    export_format: str = typer.Option("glb", help="Export format"),
+    device: str = typer.Option("cuda", help="Device to use"),
+    use_backend: bool = typer.Option(False, help="Use backend service for inference"),
+    backend_url: str = typer.Option(
+        "http://localhost:8008", help="Backend URL (default: http://localhost:8008)"
+    ),
+    process_res: int = typer.Option(504, help="Processing resolution"),
+    process_res_method: str = typer.Option(
+        "upper_bound_resize", help="Processing resolution method"
+    ),
+    export_feat: str = typer.Option(
+        "",
+        help="[FEAT_VIS] Export features from specified layers using comma-separated indices (e.g., '0,1,2').",
+    ),
+    auto_cleanup: bool = typer.Option(
+        False, help="Automatically clean export directory if it exists (no prompt)"
+    ),
+    # GLB export options
+    conf_thresh_percentile: float = typer.Option(
+        40.0, help="[GLB] Lower percentile for adaptive confidence threshold"
+    ),
+    num_max_points: int = typer.Option(
+        1_000_000, help="[GLB] Maximum number of points in the point cloud"
+    ),
+    show_cameras: bool = typer.Option(
+        True, help="[GLB] Show camera wireframes in the exported scene"
+    ),
+    # Feat_vis export options
+    feat_vis_fps: int = typer.Option(15, help="[FEAT_VIS] Frame rate for output video"),
+):
+    """Run camera pose and depth estimation on a single image."""
+    # Process input
+    image_files = ImageHandler.process(image_path)
+    # Handle export directory
+    export_dir = InputHandler.handle_export_dir(export_dir, auto_cleanup)
+    # Parse export_feat parameter
+    export_feat_layers = parse_export_feat(export_feat)
+    # Determine backend URL based on use_backend flag
+    final_backend_url = backend_url if use_backend else None
+    # Run inference
+    run_inference(
+        image_paths=image_files,
+        export_dir=export_dir,
+        model_dir=model_dir,
+        device=device,
+        backend_url=final_backend_url,
+        export_format=export_format,
+        process_res=process_res,
+        process_res_method=process_res_method,
+        export_feat_layers=export_feat_layers,
+        conf_thresh_percentile=conf_thresh_percentile,
+        num_max_points=num_max_points,
+        show_cameras=show_cameras,
+        feat_vis_fps=feat_vis_fps,
+    )
+@app.command()
+def images(
+    images_dir: str = typer.Argument(..., help="Path to directory containing input images"),
+    image_extensions: str = typer.Option(
+        "png,jpg,jpeg", help="Comma-separated image file extensions to process"
+    ),
+    model_dir: str = typer.Option(DEFAULT_MODEL, help="Model directory path"),
+    export_dir: str = typer.Option(DEFAULT_EXPORT_DIR, help="Export directory"),
+    export_format: str = typer.Option("glb", help="Export format"),
+    device: str = typer.Option("cuda", help="Device to use"),
+    use_backend: bool = typer.Option(False, help="Use backend service for inference"),
+    backend_url: str = typer.Option(
+        "http://localhost:8008", help="Backend URL (default: http://localhost:8008)"
+    ),
+    process_res: int = typer.Option(504, help="Processing resolution"),
+    process_res_method: str = typer.Option(
+        "upper_bound_resize", help="Processing resolution method"
+    ),
+    export_feat: str = typer.Option(
+        "",
+        help="[FEAT_VIS] Export features from specified layers using comma-separated indices (e.g., '0,1,2').",
+    ),
+    auto_cleanup: bool = typer.Option(
+        False, help="Automatically clean export directory if it exists (no prompt)"
+    ),
+    # GLB export options
+    conf_thresh_percentile: float = typer.Option(
+        40.0, help="[GLB] Lower percentile for adaptive confidence threshold"
+    ),
+    num_max_points: int = typer.Option(
+        1_000_000, help="[GLB] Maximum number of points in the point cloud"
+    ),
+    show_cameras: bool = typer.Option(
+        True, help="[GLB] Show camera wireframes in the exported scene"
+    ),
+    # Feat_vis export options
+    feat_vis_fps: int = typer.Option(15, help="[FEAT_VIS] Frame rate for output video"),
+):
+    """Run camera pose and depth estimation on a directory of images."""
+    # Process input
+    image_files = ImagesHandler.process(images_dir, image_extensions)
+    # Handle export directory
+    export_dir = InputHandler.handle_export_dir(export_dir, auto_cleanup)
+    # Parse export_feat parameter
+    export_feat_layers = parse_export_feat(export_feat)
+    # Determine backend URL based on use_backend flag
+    final_backend_url = backend_url if use_backend else None
+    # Run inference
+    run_inference(
+        image_paths=image_files,
+        export_dir=export_dir,
+        model_dir=model_dir,
+        device=device,
+        backend_url=final_backend_url,
+        export_format=export_format,
+        process_res=process_res,
+        process_res_method=process_res_method,
+        export_feat_layers=export_feat_layers,
+        conf_thresh_percentile=conf_thresh_percentile,
+        num_max_points=num_max_points,
+        show_cameras=show_cameras,
+        feat_vis_fps=feat_vis_fps,
+    )
+@app.command()
+def colmap(
+    colmap_dir: str = typer.Argument(
+        ..., help="Path to COLMAP directory containing 'images' and 'sparse' subdirectories"
+    ),
+    sparse_subdir: str = typer.Option(
+        "", help="Sparse reconstruction subdirectory (e.g., '0' for sparse/0/, empty for sparse/)"
+    ),
+    align_to_input_ext_scale: bool = typer.Option(
+        True, help="Align prediction to input extrinsics scale"
+    ),
+    model_dir: str = typer.Option(DEFAULT_MODEL, help="Model directory path"),
+    export_dir: str = typer.Option(DEFAULT_EXPORT_DIR, help="Export directory"),
+    export_format: str = typer.Option("glb", help="Export format"),
+    device: str = typer.Option("cuda", help="Device to use"),
+    use_backend: bool = typer.Option(False, help="Use backend service for inference"),
+    backend_url: str = typer.Option(
+        "http://localhost:8008", help="Backend URL (default: http://localhost:8008)"
+    ),
+    process_res: int = typer.Option(504, help="Processing resolution"),
+    process_res_method: str = typer.Option(
+        "upper_bound_resize", help="Processing resolution method"
+    ),
+    export_feat: str = typer.Option(
+        "",
+        help="Export features from specified layers using comma-separated indices (e.g., '0,1,2').",
+    ),
+    auto_cleanup: bool = typer.Option(
+        False, help="Automatically clean export directory if it exists (no prompt)"
+    ),
+    # GLB export options
+    conf_thresh_percentile: float = typer.Option(
+        40.0, help="[GLB] Lower percentile for adaptive confidence threshold"
+    ),
+    num_max_points: int = typer.Option(
+        1_000_000, help="[GLB] Maximum number of points in the point cloud"
+    ),
+    show_cameras: bool = typer.Option(
+        True, help="[GLB] Show camera wireframes in the exported scene"
+    ),
+    # Feat_vis export options
+    feat_vis_fps: int = typer.Option(15, help="[FEAT_VIS] Frame rate for output video"),
+):
+    """Run pose conditioned depth estimation on COLMAP data."""
+    # Process input
+    image_files, extrinsics, intrinsics = ColmapHandler.process(colmap_dir, sparse_subdir)
+    # Handle export directory
+    export_dir = InputHandler.handle_export_dir(export_dir, auto_cleanup)
+    # Parse export_feat parameter
+    export_feat_layers = parse_export_feat(export_feat)
+    # Determine backend URL based on use_backend flag
+    final_backend_url = backend_url if use_backend else None
+    # Run inference
+    run_inference(
+        image_paths=image_files,
+        export_dir=export_dir,
+        model_dir=model_dir,
+        device=device,
+        backend_url=final_backend_url,
+        export_format=export_format,
+        process_res=process_res,
+        process_res_method=process_res_method,
+        export_feat_layers=export_feat_layers,
+        extrinsics=extrinsics,
+        intrinsics=intrinsics,
+        align_to_input_ext_scale=align_to_input_ext_scale,
+        conf_thresh_percentile=conf_thresh_percentile,
+        num_max_points=num_max_points,
+        show_cameras=show_cameras,
+        feat_vis_fps=feat_vis_fps,
+    )
+@app.command()
+def video(
+    video_path: str = typer.Argument(..., help="Path to input video file"),
+    fps: float = typer.Option(1.0, help="Sampling FPS for frame extraction"),
+    model_dir: str = typer.Option(DEFAULT_MODEL, help="Model directory path"),
+    export_dir: str = typer.Option(DEFAULT_EXPORT_DIR, help="Export directory"),
+    export_format: str = typer.Option("glb", help="Export format"),
+    device: str = typer.Option("cuda", help="Device to use"),
+    use_backend: bool = typer.Option(False, help="Use backend service for inference"),
+    backend_url: str = typer.Option(
+        "http://localhost:8008", help="Backend URL (default: http://localhost:8008)"
+    ),
+    process_res: int = typer.Option(504, help="Processing resolution"),
+    process_res_method: str = typer.Option(
+        "upper_bound_resize", help="Processing resolution method"
+    ),
+    export_feat: str = typer.Option(
+        "",
+        help="[FEAT_VIS] Export features from specified layers using comma-separated indices (e.g., '0,1,2').",
+    ),
+    auto_cleanup: bool = typer.Option(
+        False, help="Automatically clean export directory if it exists (no prompt)"
+    ),
+    # GLB export options
+    conf_thresh_percentile: float = typer.Option(
+        40.0, help="[GLB] Lower percentile for adaptive confidence threshold"
+    ),
+    num_max_points: int = typer.Option(
+        1_000_000, help="[GLB] Maximum number of points in the point cloud"
+    ),
+    show_cameras: bool = typer.Option(
+        True, help="[GLB] Show camera wireframes in the exported scene"
+    ),
+    # Feat_vis export options
+    feat_vis_fps: int = typer.Option(15, help="[FEAT_VIS] Frame rate for output video"),
+):
+    """Run depth estimation on video by extracting frames and processing them."""
+    # Handle export directory
+    export_dir = InputHandler.handle_export_dir(export_dir, auto_cleanup)
+    # Process input
+    image_files = VideoHandler.process(video_path, export_dir, fps)
+    # Parse export_feat parameter
+    export_feat_layers = parse_export_feat(export_feat)
+    # Determine backend URL based on use_backend flag
+    final_backend_url = backend_url if use_backend else None
+    # Run inference
+    run_inference(
+        image_paths=image_files,
+        export_dir=export_dir,
+        model_dir=model_dir,
+        device=device,
+        backend_url=final_backend_url,
+        export_format=export_format,
+        process_res=process_res,
+        process_res_method=process_res_method,
+        export_feat_layers=export_feat_layers,
+        conf_thresh_percentile=conf_thresh_percentile,
+        num_max_points=num_max_points,
+        show_cameras=show_cameras,
+        feat_vis_fps=feat_vis_fps,
+    )
+# ============================================================================
+# Service management commands
+# ============================================================================
+@app.command()
+def backend(
+    model_dir: str = typer.Option(DEFAULT_MODEL, help="Model directory path"),
+    device: str = typer.Option("cuda", help="Device to use"),
+    host: str = typer.Option("127.0.0.1", help="Host to bind to"),
+    port: int = typer.Option(8008, help="Port to bind to"),
+    gallery_dir: str = typer.Option(DEFAULT_GALLERY_DIR, help="Gallery directory path (optional)"),
+):
+    """Start model backend service with integrated gallery."""
+    typer.echo("=" * 60)
+    typer.echo("🚀 Starting Depth Anything 3 Backend Server")
+    typer.echo("=" * 60)
+    typer.echo(f"Model directory: {model_dir}")
+    typer.echo(f"Device: {device}")
+    # Check if gallery directory exists
+    if gallery_dir and os.path.exists(gallery_dir):
+        typer.echo(f"Gallery directory: {gallery_dir}")
+    else:
+        gallery_dir = None  # Disable gallery if directory doesn't exist
+    typer.echo()
+    typer.echo("📡 Server URLs (Ctrl/CMD+Click to open):")
+    typer.echo(f"  🏠 Home:      http://{host}:{port}")
+    typer.echo(f"  📊 Dashboard: http://{host}:{port}/dashboard")
+    typer.echo(f"  📈 API Status: http://{host}:{port}/status")
+    if gallery_dir:
+        typer.echo(f"  🎨 Gallery:   http://{host}:{port}/gallery/")
+    typer.echo("=" * 60)
+    try:
+        start_server(model_dir, device, host, port, gallery_dir)
+    except KeyboardInterrupt:
+        typer.echo("\n👋 Backend server stopped.")
+    except Exception as e:
+        typer.echo(f"❌ Failed to start backend: {e}")
+        raise typer.Exit(1)
+# ============================================================================
+# Application launch commands
+# ============================================================================
+@app.command()
+def gradio(
+    model_dir: str = typer.Option(DEFAULT_MODEL,help="Model directory path"),
+    workspace_dir: str = typer.Option(DEFAULT_GRADIO_DIR,help="Workspace directory path"),
+    gallery_dir: str = typer.Option(DEFAULT_GALLERY_DIR,help="Gallery directory path"),
+    host: str = typer.Option("127.0.0.1", help="Host address to bind to"),
+    port: int = typer.Option(7860, help="Port number to bind to"),
+    share: bool = typer.Option(False, help="Create a public link for the app"),
+    debug: bool = typer.Option(False, help="Enable debug mode"),
+    cache_examples: bool = typer.Option(
+        False, help="Pre-cache all example scenes at startup for faster loading"
+    ),
+    cache_gs_tag: str = typer.Option(
+        "",
+        help="Tag to match scene names for high-res+3DGS caching (e.g., 'dl3dv'). Scenes containing this tag will use high_res and infer_gs=True; others will use low_res only.",
+    ),
+):
+    """Launch Depth Anything 3 Gradio interactive web application"""
+    from depth_anything_3.app.gradio_app import DepthAnything3App
+    # Create necessary directories
+    os.makedirs(workspace_dir, exist_ok=True)
+    os.makedirs(gallery_dir, exist_ok=True)
+    typer.echo("Launching Depth Anything 3 Gradio application...")
+    typer.echo(f"Model directory: {model_dir}")
+    typer.echo(f"Workspace directory: {workspace_dir}")
+    typer.echo(f"Gallery directory: {gallery_dir}")
+    typer.echo(f"Host: {host}")
+    typer.echo(f"Port: {port}")
+    typer.echo(f"Share: {share}")
+    typer.echo(f"Debug mode: {debug}")
+    typer.echo(f"Cache examples: {cache_examples}")
+    if cache_examples:
+        if cache_gs_tag:
+            typer.echo(
+                f"Cache GS Tag: '{cache_gs_tag}' (scenes matching this tag will use high-res + 3DGS)"
+            )
+        else:
+            typer.echo(f"Cache GS Tag: None (all scenes will use low-res only)")
+    try:
+        # Initialize and launch application
+        app = DepthAnything3App(
+            model_dir=model_dir, workspace_dir=workspace_dir, gallery_dir=gallery_dir
+        )
+        # Pre-cache examples if requested
+        if cache_examples:
+            typer.echo("\n" + "=" * 60)
+            typer.echo("Pre-caching mode enabled")
+            if cache_gs_tag:
+                typer.echo(f"Scenes containing '{cache_gs_tag}' will use HIGH-RES + 3DGS")
+                typer.echo(f"Other scenes will use LOW-RES only")
+            else:
+                typer.echo(f"All scenes will use LOW-RES only")
+            typer.echo("=" * 60)
+            app.cache_examples(
+                show_cam=True,
+                filter_black_bg=False,
+                filter_white_bg=False,
+                save_percentage=20.0,
+                num_max_points=1000,
+                cache_gs_tag=cache_gs_tag,
+                gs_trj_mode="smooth",
+                gs_video_quality="low",
+            )
+        # Prepare launch arguments
+        launch_kwargs = {"share": share, "debug": debug}
+        app.launch(host=host, port=port, **launch_kwargs)
+    except KeyboardInterrupt:
+        typer.echo("\nGradio application stopped.")
+    except Exception as e:
+        typer.echo(f"Failed to launch Gradio application: {e}")
+        raise typer.Exit(1)
+@app.command()
+def gallery(
+    gallery_dir: str = typer.Option(DEFAULT_GALLERY_DIR, help="Gallery root directory"),
+    host: str = typer.Option("127.0.0.1", help="Host address to bind to"),
+    port: int = typer.Option(8007, help="Port number to bind to"),
+    open_browser: bool = typer.Option(False, help="Open browser after launch"),
+):
+    """Launch Depth Anything 3 Gallery server"""
+    # Validate gallery directory
+    if not os.path.exists(gallery_dir):
+        raise typer.BadParameter(f"Gallery directory not found: {gallery_dir}")
+    typer.echo("Launching Depth Anything 3 Gallery server...")
+    typer.echo(f"Gallery directory: {gallery_dir}")
+    typer.echo(f"Host: {host}")
+    typer.echo(f"Port: {port}")
+    typer.echo(f"Auto-open browser: {open_browser}")
+    try:
+        # Set command line arguments
+        import sys
+        sys.argv = ["gallery", "--dir", gallery_dir, "--host", host, "--port", str(port)]
+        if open_browser:
+            sys.argv.append("--open")
+        # Launch gallery server
+        gallery_main()
+    except KeyboardInterrupt:
+        typer.echo("\nGallery server stopped.")
+    except Exception as e:
+        typer.echo(f"Failed to launch Gallery server: {e}")
+        raise typer.Exit(1)
+if __name__ == "__main__":
+    app()

src/depth_anything_3/configs/da3-base.yaml ADDED Viewed

	@@ -0,0 +1,45 @@

+__object__:
+  path: depth_anything_3.model.da3
+  name: DepthAnything3Net
+  args: as_params
+net:
+  __object__:
+    path: depth_anything_3.model.dinov2.dinov2
+    name: DinoV2
+    args: as_params
+  name: vitb
+  out_layers: [5, 7, 9, 11]
+  alt_start: 4
+  qknorm_start: 4
+  rope_start: 4
+  cat_token: True
+head:
+  __object__:
+    path: depth_anything_3.model.dualdpt
+    name: DualDPT
+    args: as_params
+  dim_in: &head_dim_in 1536
+  output_dim: 2
+  features: &head_features 128
+  out_channels: &head_out_channels [96, 192, 384, 768]
+cam_enc:
+  __object__:
+    path: depth_anything_3.model.cam_enc
+    name: CameraEnc
+    args: as_params
+  dim_out: 768
+cam_dec:
+  __object__:
+    path: depth_anything_3.model.cam_dec
+    name: CameraDec
+    args: as_params
+  dim_in: 1536

src/depth_anything_3/configs/da3-giant.yaml ADDED Viewed

	@@ -0,0 +1,71 @@

+__object__:
+  path: depth_anything_3.model.da3
+  name: DepthAnything3Net
+  args: as_params
+net:
+  __object__:
+    path: depth_anything_3.model.dinov2.dinov2
+    name: DinoV2
+    args: as_params
+  name: vitg
+  out_layers: [19, 27, 33, 39]
+  alt_start: 13
+  qknorm_start: 13
+  rope_start: 13
+  cat_token: True
+head:
+  __object__:
+    path: depth_anything_3.model.dualdpt
+    name: DualDPT
+    args: as_params
+  dim_in: &head_dim_in 3072
+  output_dim: 2
+  features: &head_features 256
+  out_channels: &head_out_channels [256, 512, 1024, 1024]
+cam_enc:
+  __object__:
+    path: depth_anything_3.model.cam_enc
+    name: CameraEnc
+    args: as_params
+  dim_out: 1536
+cam_dec:
+  __object__:
+    path: depth_anything_3.model.cam_dec
+    name: CameraDec
+    args: as_params
+  dim_in: 3072
+gs_head:
+  __object__:
+    path: depth_anything_3.model.gsdpt
+    name: GSDPT
+    args: as_params
+  dim_in: *head_dim_in
+  output_dim: 38  # should align with gs_adapter's setting, for gs params
+  features: *head_features
+  out_channels: *head_out_channels
+gs_adapter:
+  __object__:
+    path: depth_anything_3.model.gs_adapter
+    name: GaussianAdapter
+    args: as_params
+  sh_degree: 2
+  pred_color: false  # predict SH coefficient if false
+  pred_offset_depth: true
+  pred_offset_xy: true
+  gaussian_scale_min: 1e-5
+  gaussian_scale_max: 30.0

src/depth_anything_3/configs/da3-large.yaml ADDED Viewed

	@@ -0,0 +1,45 @@

+__object__:
+  path: depth_anything_3.model.da3
+  name: DepthAnything3Net
+  args: as_params
+net:
+  __object__:
+    path: depth_anything_3.model.dinov2.dinov2
+    name: DinoV2
+    args: as_params
+  name: vitl
+  out_layers: [11, 15, 19, 23]
+  alt_start: 8
+  qknorm_start: 8
+  rope_start: 8
+  cat_token: True
+head:
+  __object__:
+    path: depth_anything_3.model.dualdpt
+    name: DualDPT
+    args: as_params
+  dim_in: &head_dim_in 2048
+  output_dim: 2
+  features: &head_features 256
+  out_channels: &head_out_channels [256, 512, 1024, 1024]
+cam_enc:
+  __object__:
+    path: depth_anything_3.model.cam_enc
+    name: CameraEnc
+    args: as_params
+  dim_out: 1024
+cam_dec:
+  __object__:
+    path: depth_anything_3.model.cam_dec
+    name: CameraDec
+    args: as_params
+  dim_in: 2048

src/depth_anything_3/configs/da3-small.yaml ADDED Viewed

	@@ -0,0 +1,45 @@

+__object__:
+  path: depth_anything_3.model.da3
+  name: DepthAnything3Net
+  args: as_params
+net:
+  __object__:
+    path: depth_anything_3.model.dinov2.dinov2
+    name: DinoV2
+    args: as_params
+  name: vits
+  out_layers: [5, 7, 9, 11]
+  alt_start: 4
+  qknorm_start: 4
+  rope_start: 4
+  cat_token: True
+head:
+  __object__:
+    path: depth_anything_3.model.dualdpt
+    name: DualDPT
+    args: as_params
+  dim_in: &head_dim_in 768
+  output_dim: 2
+  features: &head_features 64
+  out_channels: &head_out_channels [48, 96, 192, 384]
+cam_enc:
+  __object__:
+    path: depth_anything_3.model.cam_enc
+    name: CameraEnc
+    args: as_params
+  dim_out: 384
+cam_dec:
+  __object__:
+    path: depth_anything_3.model.cam_dec
+    name: CameraDec
+    args: as_params
+  dim_in: 768

src/depth_anything_3/configs/da3metric-large.yaml ADDED Viewed

	@@ -0,0 +1,28 @@

+__object__:
+  path: depth_anything_3.model.da3
+  name: DepthAnything3Net
+  args: as_params
+net:
+  __object__:
+    path: depth_anything_3.model.dinov2.dinov2
+    name: DinoV2
+    args: as_params
+  name: vitl
+  out_layers: [4, 11, 17, 23]
+  alt_start: -1 # -1 means disable
+  qknorm_start: -1
+  rope_start: -1
+  cat_token: False
+head:
+  __object__:
+    path: depth_anything_3.model.dpt
+    name: DPT
+    args: as_params
+  dim_in: 1024
+  output_dim: 1
+  features: 256
+  out_channels: [256, 512, 1024, 1024]

src/depth_anything_3/configs/da3mono-large.yaml ADDED Viewed

	@@ -0,0 +1,28 @@

+__object__:
+  path: depth_anything_3.model.da3
+  name: DepthAnything3Net
+  args: as_params
+net:
+  __object__:
+    path: depth_anything_3.model.dinov2.dinov2
+    name: DinoV2
+    args: as_params
+  name: vitl
+  out_layers: [4, 11, 17, 23]
+  alt_start: -1 # -1 means disable
+  qknorm_start: -1
+  rope_start: -1
+  cat_token: False
+head:
+  __object__:
+    path: depth_anything_3.model.dpt
+    name: DPT
+    args: as_params
+  dim_in: 1024
+  output_dim: 1
+  features: 256
+  out_channels: [256, 512, 1024, 1024]

src/depth_anything_3/configs/da3nested-giant-large.yaml ADDED Viewed

	@@ -0,0 +1,10 @@

+__object__:
+  path: depth_anything_3.model.da3
+  name: NestedDepthAnything3Net
+  args: as_params
+anyview:
+  __inherit__: depth_anything_3.configs.da3-giant
+metric:
+  __inherit__: depth_anything_3.configs.da3metric-large

src/depth_anything_3/model/__init__.py ADDED Viewed

	@@ -0,0 +1,20 @@

+# Copyright (c) 2025 ByteDance Ltd. and/or its affiliates
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from depth_anything_3.model.da3 import DepthAnything3Net, NestedDepthAnything3Net
+__export__ = [
+    NestedDepthAnything3Net,
+    DepthAnything3Net,
+]

src/depth_anything_3/model/__pycache__/__init__.cpython-311.pyc ADDED Viewed

Binary file (379 Bytes). View file

src/depth_anything_3/model/__pycache__/cam_dec.cpython-311.pyc ADDED Viewed

Binary file (2.72 kB). View file

src/depth_anything_3/model/__pycache__/cam_enc.cpython-311.pyc ADDED Viewed

Binary file (3.24 kB). View file