| license: apache-2.0 | |
| datasets: | |
| - ILSVRC/imagenet-1k | |
| model-index: | |
| - name: VQGAN+ | |
| results: | |
| - task: | |
| type: image-generation | |
| dataset: | |
| name: ILSVRC/imagenet-1k | |
| type: ILSVRC/imagenet-1k | |
| metrics: | |
| - name: rFID | |
| type: rFID | |
| value: 1.39 | |
| - name: InceptionScore | |
| type: InceptionScore | |
| value: 193.9 | |
| - name: LPIPS | |
| type: LPIPS | |
| value: 0.315 | |
| - name: PSNR | |
| type: PSNR | |
| value: 21 | |
| - name: SSIM | |
| type: SSIM | |
| value: 0.55 | |
| - name: CodebookUsage | |
| type: CodebookUsage | |
| value: 1.0 | |
| This model is the VQGAN+ tokenizer with a vocabulary size of 12 bits. It uses a downsampling factor of 16 and is trained on ImageNet for images of resolution 256. | |
| You can find more details on the [project page](https://weber-mark.github.io/projects/maskbit.html) and in the [paper](https://arxiv.org/abs/2409.16211). |