Apertus-8B-Instruct-2509-W8A16
This is an INT8 weight-only quantized version of swiss-ai/Apertus-8B-Instruct-2509 using llm-compressor.
馃挕 What this means in practice
- Only the weights are quantized to 8-bit integers (INT8)
- Activations remain FP16/BF16
- No FP8 is used in this configuration
- Faster inference, reduced memory, minimal accuracy loss
Quantization Details
- Quantization Scheme: W8A16 (INT8 weights, FP16 activations)
- Method: Weight-only INT8 quantization
- Targets: All
Linearlayers - Ignored Layers:
lm_head(kept in higher precision) - Tool: llm-compressor
- Downloads last month
- 2,186
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
馃檵
Ask for provider support