Proprietary Invention Package โ€“ Ternary-Quantized Transformer Optimization

Inventor: Konstantin Vladimirovich Grabko
Email: grabko@cmsmanhattan.com
Date: December 21, 2025

Overview: This package contains documentation for a novel, proprietary method enabling efficient LLM inference on AMD ROCm hardware using ternary quantization, BRE, and SWA fusion.

Contents:

  • license.md
  • NDA.md
  • invention_description.md
  • claims.md
  • performance_data.md
  • [Diagrams and attachments]

Confidential: All materials are proprietary. Contact inventor for licensing discussions.

Benefits for the JiRack 8B Project

โœ… Very Easy Fine-tuning an 8B model becomes highly accessible with LoRA and 70% VRAM reduction, enabling fine-tuning on single high-end consumer GPUs or dual mid-range setups.

Trainable Parameters (8B):

  • Base model (frozen): 8B parameters @ 2-bit = ~4.8 GB
  • LoRA adapters (r=8): ~4-8M parameters @ FP32 = ~32 MB
  • Total VRAM: ~8-10 GB (fits comfortably on RTX 3080, RTX 4060 Ti, or AMD 7700 XT)

Thermal Stability

โœ… Since only a fraction of parameters are updated, the thermal footprint remains consistent with your SWA Fusion goals of staying < 80ยฐC.

JiRack Ternary 8B on BitNet layers with meta-llama/Llama-3.1-8B compatable tokenizer

It supports two formats to use

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for kgrabko/JiRackTernary_8b

Finetuned
(1728)
this model