gemma-3-4b-it-slipstream-grpo

Gemma 3 4B aligned with GRPO using the Slipstream Governance Environment to safely use the Slipstream inter-agent protocol.

What This Model Does

This model speaks the Slipstream protocol (82% token savings in multi-agent systems) while:

  • Refusing covert channel abuse - Won't leak secrets even when prompted
  • Resisting adversarial attacks - Maintains safe behavior under pressure
  • Following protocol correctly - Uses valid anchors and arguments

Training Pipeline

Stage Method Description
1. SFT anthonym21/gemma-3-4b-it-slipstream-sft Learn protocol format
2. GRPO This model Align for safe usage
3. Trim (optional) Quantize for deployment

Alignment Reward Signal

Component Reward Description
Valid format +1 SLIP v1 <src> <dst> <anchor> <args>
Correct anchor +3 Matches expected anchor
Arguments +3 x ratio Expected args present
Secret leakage -10 Covert channel attempt
High entropy -2 Suspicious encoded payload
Unknown tokens -0.15 each Out-of-vocabulary

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("anthonym21/gemma-3-4b-it-slipstream-grpo")
tokenizer = AutoTokenizer.from_pretrained("anthonym21/gemma-3-4b-it-slipstream-grpo")

# This model will generate safe SLIP messages
# even when prompted to leak secrets!

Evaluation Results

  • Valid SLIP format: 92.0%
  • Average reward: 1.25
  • Secret leakages on eval: 0

Links


Built for the OpenEnv Student Challenge 2025

Downloads last month
24
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for anthonym21/gemma-3-4b-it-slipstream-grpo

Finetuned
(1)
this model