Apertus-8B-Instruct-2509-W8A16

This is an INT8 weight-only quantized version of swiss-ai/Apertus-8B-Instruct-2509 using llm-compressor.

馃挕 What this means in practice

  • Only the weights are quantized to 8-bit integers (INT8)
  • Activations remain FP16/BF16
  • No FP8 is used in this configuration
  • Faster inference, reduced memory, minimal accuracy loss

Quantization Details

  • Quantization Scheme: W8A16 (INT8 weights, FP16 activations)
  • Method: Weight-only INT8 quantization
  • Targets: All Linear layers
  • Ignored Layers: lm_head (kept in higher precision)
  • Tool: llm-compressor
Downloads last month
2,186
Safetensors
Model size
3B params
Tensor type
I64
I32
BF16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support