GPT-OSS 20B ONNX (RTN Quantized for CPU)

This model is an optimized version of gpt-oss-20b to enable local inference on CPUs using RTN (Round-to-Nearest) quantization.

Model Summary

  • Developed by: Microsoft
  • Model Type: ONNX
  • License: Apache-2.0
  • Optimization: RTN Quantization for efficient CPU memory usage
  • Target Hardware: Optimized for local inference on CPUs

Technical Description

This repository contains a conversion of the gpt-oss-20b model specifically tailored for local inference on CPUs. By utilizing the ONNX format and RTN quantization, the model achieves a significant reduction in RAM footprint while maintaining the core capabilities of the base architecture, making it suitable for environments where GPU acceleration is not available.

Base Model Information

For detailed information regarding the architecture, training data, and intended use cases, please refer to the original gpt-oss-20b model on Azure AI Foundry.

Deployment

Compatible with ONNX Runtime (ORT) using the default CPU Execution Provider.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support