jgerster0
/

Apertus-8B-Instruct-2509-W8A16

compressed-tensors

Model card Files Files and versions

Apertus-8B-Instruct-2509-W8A16

This is an INT8 weight-only quantized version of swiss-ai/Apertus-8B-Instruct-2509 using llm-compressor.

💡 What this means in practice

Only the weights are quantized to 8-bit integers (INT8)
Activations remain FP16/BF16
No FP8 is used in this configuration
Faster inference, reduced memory, minimal accuracy loss

Quantization Details

Quantization Scheme: W8A16 (INT8 weights, FP16 activations)
Method: Weight-only INT8 quantization
Targets: All Linear layers
Ignored Layers: lm_head (kept in higher precision)
Tool: llm-compressor

Downloads last month: 2,186

Safetensors

Model size

3B params

Tensor type

I64

·

I32

·

BF16

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support