Update README.md
Browse files
README.md
CHANGED
|
@@ -275,7 +275,7 @@ lm_eval \
|
|
| 275 |
## Inference Performance
|
| 276 |
|
| 277 |
|
| 278 |
-
This model achieves up to 2.80x speedup in single-stream deployment and up to 1.
|
| 279 |
The following performance benchmarks were conducted with [vLLM](https://docs.vllm.ai/en/latest/) version 0.7.2, and [GuideLLM](https://github.com/neuralmagic/guidellm).
|
| 280 |
|
| 281 |
<details>
|
|
@@ -427,21 +427,21 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
| 427 |
<tr>
|
| 428 |
<td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w8a8</td>
|
| 429 |
<td>1.70</td>
|
| 430 |
-
<td>
|
| 431 |
<td>766</td>
|
| 432 |
-
<td>
|
| 433 |
<td>1142</td>
|
| 434 |
-
<td>
|
| 435 |
<td>1348</td>
|
| 436 |
</tr>
|
| 437 |
<tr>
|
| 438 |
<td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
|
| 439 |
<td>1.48</td>
|
| 440 |
-
<td>
|
| 441 |
<td>552</td>
|
| 442 |
-
<td>
|
| 443 |
<td>1010</td>
|
| 444 |
-
<td>
|
| 445 |
<td>1360</td>
|
| 446 |
</tr>
|
| 447 |
<tr>
|
|
@@ -458,21 +458,21 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
| 458 |
<tr>
|
| 459 |
<td>neuralmagic/Pixtral-Large-Instruct-2411-hf-FP8-Dynamic</td>
|
| 460 |
<td>1.61</td>
|
| 461 |
-
<td>
|
| 462 |
<td>905</td>
|
| 463 |
-
<td>
|
| 464 |
<td>1406</td>
|
| 465 |
-
<td>
|
| 466 |
<td>1759</td>
|
| 467 |
</tr>
|
| 468 |
<tr>
|
| 469 |
<td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
|
| 470 |
<td>1.33</td>
|
| 471 |
-
<td>
|
| 472 |
<td>761</td>
|
| 473 |
-
<td>
|
| 474 |
<td>1228</td>
|
| 475 |
-
<td>
|
| 476 |
<td>1480</td>
|
| 477 |
</tr>
|
| 478 |
</tbody>
|
|
|
|
| 275 |
## Inference Performance
|
| 276 |
|
| 277 |
|
| 278 |
+
This model achieves up to 2.80x speedup in single-stream deployment and up to 1.75x speedup in multi-stream asynchronous deployment, depending on hardware and use-case scenario.
|
| 279 |
The following performance benchmarks were conducted with [vLLM](https://docs.vllm.ai/en/latest/) version 0.7.2, and [GuideLLM](https://github.com/neuralmagic/guidellm).
|
| 280 |
|
| 281 |
<details>
|
|
|
|
| 427 |
<tr>
|
| 428 |
<td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w8a8</td>
|
| 429 |
<td>1.70</td>
|
| 430 |
+
<td>0.8</td>
|
| 431 |
<td>766</td>
|
| 432 |
+
<td>1.1</td>
|
| 433 |
<td>1142</td>
|
| 434 |
+
<td>1.3</td>
|
| 435 |
<td>1348</td>
|
| 436 |
</tr>
|
| 437 |
<tr>
|
| 438 |
<td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
|
| 439 |
<td>1.48</td>
|
| 440 |
+
<td>0.5</td>
|
| 441 |
<td>552</td>
|
| 442 |
+
<td>1.0</td>
|
| 443 |
<td>1010</td>
|
| 444 |
+
<td>1.4</td>
|
| 445 |
<td>1360</td>
|
| 446 |
</tr>
|
| 447 |
<tr>
|
|
|
|
| 458 |
<tr>
|
| 459 |
<td>neuralmagic/Pixtral-Large-Instruct-2411-hf-FP8-Dynamic</td>
|
| 460 |
<td>1.61</td>
|
| 461 |
+
<td>1.7</td>
|
| 462 |
<td>905</td>
|
| 463 |
+
<td>2.6</td>
|
| 464 |
<td>1406</td>
|
| 465 |
+
<td>3.2</td>
|
| 466 |
<td>1759</td>
|
| 467 |
</tr>
|
| 468 |
<tr>
|
| 469 |
<td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
|
| 470 |
<td>1.33</td>
|
| 471 |
+
<td>1.4</td>
|
| 472 |
<td>761</td>
|
| 473 |
+
<td>2.2</td>
|
| 474 |
<td>1228</td>
|
| 475 |
+
<td>2.7</td>
|
| 476 |
<td>1480</td>
|
| 477 |
</tr>
|
| 478 |
</tbody>
|