shubhrapandit commited on
Commit
0a24006
·
verified ·
1 Parent(s): 4898ae6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -13
README.md CHANGED
@@ -275,7 +275,7 @@ lm_eval \
275
  ## Inference Performance
276
 
277
 
278
- This model achieves up to 2.80x speedup in single-stream deployment and up to 1.70x speedup in multi-stream asynchronous deployment, depending on hardware and use-case scenario.
279
  The following performance benchmarks were conducted with [vLLM](https://docs.vllm.ai/en/latest/) version 0.7.2, and [GuideLLM](https://github.com/neuralmagic/guidellm).
280
 
281
  <details>
@@ -427,21 +427,21 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
427
  <tr>
428
  <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w8a8</td>
429
  <td>1.70</td>
430
- <td>1.6</td>
431
  <td>766</td>
432
- <td>2.2</td>
433
  <td>1142</td>
434
- <td>2.6</td>
435
  <td>1348</td>
436
  </tr>
437
  <tr>
438
  <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
439
  <td>1.48</td>
440
- <td>1.0</td>
441
  <td>552</td>
442
- <td>2.0</td>
443
  <td>1010</td>
444
- <td>2.8</td>
445
  <td>1360</td>
446
  </tr>
447
  <tr>
@@ -458,21 +458,21 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
458
  <tr>
459
  <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-FP8-Dynamic</td>
460
  <td>1.61</td>
461
- <td>3.4</td>
462
  <td>905</td>
463
- <td>5.2</td>
464
  <td>1406</td>
465
- <td>6.4</td>
466
  <td>1759</td>
467
  </tr>
468
  <tr>
469
  <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
470
  <td>1.33</td>
471
- <td>2.8</td>
472
  <td>761</td>
473
- <td>4.4</td>
474
  <td>1228</td>
475
- <td>5.4</td>
476
  <td>1480</td>
477
  </tr>
478
  </tbody>
 
275
  ## Inference Performance
276
 
277
 
278
+ This model achieves up to 2.80x speedup in single-stream deployment and up to 1.75x speedup in multi-stream asynchronous deployment, depending on hardware and use-case scenario.
279
  The following performance benchmarks were conducted with [vLLM](https://docs.vllm.ai/en/latest/) version 0.7.2, and [GuideLLM](https://github.com/neuralmagic/guidellm).
280
 
281
  <details>
 
427
  <tr>
428
  <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w8a8</td>
429
  <td>1.70</td>
430
+ <td>0.8</td>
431
  <td>766</td>
432
+ <td>1.1</td>
433
  <td>1142</td>
434
+ <td>1.3</td>
435
  <td>1348</td>
436
  </tr>
437
  <tr>
438
  <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
439
  <td>1.48</td>
440
+ <td>0.5</td>
441
  <td>552</td>
442
+ <td>1.0</td>
443
  <td>1010</td>
444
+ <td>1.4</td>
445
  <td>1360</td>
446
  </tr>
447
  <tr>
 
458
  <tr>
459
  <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-FP8-Dynamic</td>
460
  <td>1.61</td>
461
+ <td>1.7</td>
462
  <td>905</td>
463
+ <td>2.6</td>
464
  <td>1406</td>
465
+ <td>3.2</td>
466
  <td>1759</td>
467
  </tr>
468
  <tr>
469
  <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
470
  <td>1.33</td>
471
+ <td>1.4</td>
472
  <td>761</td>
473
+ <td>2.2</td>
474
  <td>1228</td>
475
+ <td>2.7</td>
476
  <td>1480</td>
477
  </tr>
478
  </tbody>