Proprietary Invention Package โ Ternary-Quantized Transformer Optimization
Inventor: Konstantin Vladimirovich Grabko
Email: grabko@cmsmanhattan.com
Date: December 21, 2025
Overview: This package contains documentation for a novel, proprietary method enabling efficient LLM inference on AMD ROCm hardware using ternary quantization, BRE, and SWA fusion.
Contents:
- license.md
- NDA.md
- invention_description.md
- claims.md
- performance_data.md
- [Diagrams and attachments]
Confidential: All materials are proprietary. Contact inventor for licensing discussions.
Benefits for the JiRack 8B Project
โ Very Easy Fine-tuning an 8B model becomes highly accessible with LoRA and 70% VRAM reduction, enabling fine-tuning on single high-end consumer GPUs or dual mid-range setups.
Trainable Parameters (8B):
- Base model (frozen): 8B parameters @ 2-bit = ~4.8 GB
- LoRA adapters (r=8): ~4-8M parameters @ FP32 = ~32 MB
- Total VRAM: ~8-10 GB (fits comfortably on RTX 3080, RTX 4060 Ti, or AMD 7700 XT)
Thermal Stability
โ Since only a fraction of parameters are updated, the thermal footprint remains consistent with your SWA Fusion goals of staying < 80ยฐC.
JiRack Ternary 8B on BitNet layers with meta-llama/Llama-3.1-8B compatable tokenizer
It supports two formats to use
Model tree for kgrabko/JiRackTernary_8b
Base model
meta-llama/Llama-3.1-8B