Update README.md
Browse files
README.md
CHANGED
|
@@ -16,19 +16,17 @@ They were created using the [convert.py script](https://github.com/chrisgoringe/
|
|
| 16 |
They can be loaded in ComfyUI using the [ComfyUI GGUF Nodes](https://github.com/city96/ComfyUI-GGUF). Just put the gguf files in your
|
| 17 |
models/unet directory.
|
| 18 |
|
| 19 |
-
## Bigger numbers in the name = smaller model!
|
| 20 |
-
|
| 21 |
## Naming convention (mx for 'mixed')
|
| 22 |
|
| 23 |
-
[original_model_name]
|
| 24 |
|
| 25 |
-
where
|
| 26 |
```
|
| 27 |
-
-
|
| 28 |
-
-
|
| 29 |
-
-
|
| 30 |
-
-
|
| 31 |
-
-
|
| 32 |
```
|
| 33 |
## How is this optimised?
|
| 34 |
|
|
@@ -59,7 +57,7 @@ The optimisation recipes are as follows (layers 0-18 are the double_block_layers
|
|
| 59 |
```python
|
| 60 |
|
| 61 |
CONFIGURATIONS = {
|
| 62 |
-
"
|
| 63 |
'casts': [
|
| 64 |
{'layers': '0-10', 'castto': 'BF16'},
|
| 65 |
{'layers': '11-14, 54', 'castto': 'Q8_0'},
|
|
@@ -67,7 +65,7 @@ CONFIGURATIONS = {
|
|
| 67 |
{'layers': '37-38, 56', 'castto': 'Q4_1'},
|
| 68 |
]
|
| 69 |
},
|
| 70 |
-
"
|
| 71 |
'casts': [
|
| 72 |
{'layers': '0-4, 10', 'castto': 'BF16'},
|
| 73 |
{'layers': '5-9, 11-14', 'castto': 'Q8_0'},
|
|
@@ -75,7 +73,7 @@ CONFIGURATIONS = {
|
|
| 75 |
{'layers': '36-40, 56', 'castto': 'Q4_1'},
|
| 76 |
]
|
| 77 |
},
|
| 78 |
-
"
|
| 79 |
'casts': [
|
| 80 |
{'layers': '0-2', 'castto': 'BF16'},
|
| 81 |
{'layers': '5, 7-12', 'castto': 'Q8_0'},
|
|
@@ -83,13 +81,13 @@ CONFIGURATIONS = {
|
|
| 83 |
{'layers': '34-41, 56', 'castto': 'Q4_1'},
|
| 84 |
]
|
| 85 |
},
|
| 86 |
-
"
|
| 87 |
'casts': [
|
| 88 |
{'layers': '0-25, 27-28, 44-54', 'castto': 'Q5_1'},
|
| 89 |
{'layers': '26, 29-43, 55-56', 'castto': 'Q4_1'},
|
| 90 |
]
|
| 91 |
},
|
| 92 |
-
"
|
| 93 |
'casts': [
|
| 94 |
{'layers': '0-56', 'castto': 'Q4_1'},
|
| 95 |
]
|
|
|
|
| 16 |
They can be loaded in ComfyUI using the [ComfyUI GGUF Nodes](https://github.com/city96/ComfyUI-GGUF). Just put the gguf files in your
|
| 17 |
models/unet directory.
|
| 18 |
|
|
|
|
|
|
|
| 19 |
## Naming convention (mx for 'mixed')
|
| 20 |
|
| 21 |
+
[original_model_name]_mxN_N.gguf
|
| 22 |
|
| 23 |
+
where N_N is the actual average number of bits per parameter.
|
| 24 |
```
|
| 25 |
+
- 9_6 might just fit on a 16GB card
|
| 26 |
+
- 8_4 is a good balance for 16GB cards,
|
| 27 |
+
- 7_4 is roughly the size of an 8 bit model,
|
| 28 |
+
- 5_9 should work for 12 GB cards
|
| 29 |
+
- 5_1 is mostly quantised to Q4_1
|
| 30 |
```
|
| 31 |
## How is this optimised?
|
| 32 |
|
|
|
|
| 57 |
```python
|
| 58 |
|
| 59 |
CONFIGURATIONS = {
|
| 60 |
+
"9_6" : {
|
| 61 |
'casts': [
|
| 62 |
{'layers': '0-10', 'castto': 'BF16'},
|
| 63 |
{'layers': '11-14, 54', 'castto': 'Q8_0'},
|
|
|
|
| 65 |
{'layers': '37-38, 56', 'castto': 'Q4_1'},
|
| 66 |
]
|
| 67 |
},
|
| 68 |
+
"8_4" : {
|
| 69 |
'casts': [
|
| 70 |
{'layers': '0-4, 10', 'castto': 'BF16'},
|
| 71 |
{'layers': '5-9, 11-14', 'castto': 'Q8_0'},
|
|
|
|
| 73 |
{'layers': '36-40, 56', 'castto': 'Q4_1'},
|
| 74 |
]
|
| 75 |
},
|
| 76 |
+
"7_4" : {
|
| 77 |
'casts': [
|
| 78 |
{'layers': '0-2', 'castto': 'BF16'},
|
| 79 |
{'layers': '5, 7-12', 'castto': 'Q8_0'},
|
|
|
|
| 81 |
{'layers': '34-41, 56', 'castto': 'Q4_1'},
|
| 82 |
]
|
| 83 |
},
|
| 84 |
+
"5_9" : {
|
| 85 |
'casts': [
|
| 86 |
{'layers': '0-25, 27-28, 44-54', 'castto': 'Q5_1'},
|
| 87 |
{'layers': '26, 29-43, 55-56', 'castto': 'Q4_1'},
|
| 88 |
]
|
| 89 |
},
|
| 90 |
+
"5_1" : {
|
| 91 |
'casts': [
|
| 92 |
{'layers': '0-56', 'castto': 'Q4_1'},
|
| 93 |
]
|