RustMentor-1.7B-LiteRT
RustMentor-1.7B-LiteRT is a 1.7B-parameter Qwen3-based model fine-tuned for Rust programming education and code review. This repository hosts the LiteRT (.tflite) format for on-device Android inference with GPU/NPU acceleration.
For the LoRA adapter, see rust-mentor-1.7b. For GGUF (llama.cpp/Ollama), see rust-mentor-1.7b-GGUF.
Model Description
- Base Model: Qwen/Qwen3-1.7B
- Model Type: Causal LM (code tutoring + review)
- Parameters: 1.7B
- Context Length: 2048 tokens
- Fine-tuning: QLoRA (r=16, alpha=16) with Unsloth optimization
- Format: LiteRT .tflite (dynamic INT8 quantization)
- License: Apache 2.0
- Language: English, Rust code
Why LiteRT?
LiteRT (formerly TFLite) is Google's on-device ML framework. Compared to GGUF/llama.cpp:
- GPU/NPU acceleration via NNAPI on Android (Tensor G3, Snapdragon, etc.)
- 2-3x faster inference on Pixel 8 Pro vs CPU-only GGUF
- Native Android SDK β no JNI wrapper needed
- KV cache optimized for mobile memory constraints
What It Is Good At
- Explaining Rust ownership, borrowing, and lifetimes with Go/Python/TS comparisons
- Code review with borrow checker explanations
- Error handling patterns (Result, Option, ?, thiserror, anyhow)
- Async/await and Tokio patterns
- Smart pointers (Box, Rc, Arc, RefCell)
- Pattern matching and enum-based design
- Trait-based architecture and generics
- Type conversions (From, Into, AsRef, Deref)
- Serde & JSON serialization
- CLI tooling with clap
- Cargo project structure, modules, and workspaces
- Testing patterns and documentation
Intended Uses
Primary: Offline Rust programming tutor on Android (Pixel 8 Pro tested) via RustSensei app or Google AI Edge Gallery, with GPU/NPU-accelerated on-device inference.
Out-of-scope: General-purpose chat, non-Rust programming, safety-sensitive or factual tasks outside Rust development.
Prompt Examples
"In Go, I just pass values or pointers. What's this ownership thing in Rust?"
"Review this Rust code and explain what the borrow checker is doing:\n\nfn get_longest(a: String, b: String) -> String {\n if a.len() > b.len() { a } else { b }\n}"
"How do I handle errors in Rust? I'm used to Go's if err != nil pattern."
"How does async work in Rust? In Go I just use goroutines and it's simple."
How to Use
Google AI Edge Gallery (Android)
- Install Google AI Edge Gallery from Play Store
- Import the .tflite model from this repo
- Chat offline with GPU/NPU acceleration
LiteRT-LM (Programmatic β Android/Kotlin)
// Add to build.gradle.kts:
// implementation("com.google.ai.edge:litert-lm:latest")
import com.google.ai.edge.litert.lm.LlmInference
val options = LlmInference.Options.builder()
.setModelPath("/path/to/rust_mentor_1.7b.tflite")
.setMaxTokens(512)
.setTemperature(0.7f)
.setTopP(0.9f)
.build()
val llm = LlmInference.createFromOptions(context, options)
val response = llm.generateResponse("Explain Rust's ownership model to a Go developer")
MediaPipe LLM Inference (Alternative)
import mediapipe as mp
model_path = "rust_mentor_1.7b_q8_ekv2048.tflite"
llm = mp.tasks.genai.LlmInference.create_from_options(
mp.tasks.genai.LlmInferenceOptions(model_path=model_path, max_tokens=512)
)
response = llm.generate_response("How do I handle errors in Rust?")
Training Data (Summary)
- Strandset-Rust-v1: 3,000 samples of Rust code generation, review, refactoring, and bug detection tasks
- Synthetic tutor conversations: 46 unique hand-crafted Rust tutoring dialogues across 28 topics, covering ownership, error handling, traits, async, smart pointers, macros, serde, testing, and more
- Style: All conversations draw parallels to Go/Python/TypeScript equivalents
Training Configuration (QLoRA)
| Parameter | Value |
|---|---|
| Base Model | Qwen/Qwen3-1.7B |
| Method | QLoRA via Unsloth |
| LoRA Rank (r) | 16 |
| LoRA Alpha | 16 |
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Epochs | 3 |
| Batch Size | 2 x 4 (effective 8) |
| Learning Rate | 2e-4 (cosine schedule) |
| Max Sequence Length | 2048 |
| Hardware | NVIDIA A100 40GB (Google Colab) |
Export Configuration (LiteRT)
| Parameter | Value |
|---|---|
| Conversion Tool | litert-torch (re-authored Qwen3) |
| Quantization | Dynamic INT8 |
| KV Cache Length | 2048 |
| Prefill Lengths | 8, 64, 128, 256, 512, 1024 |
| Output Format | .tflite (TFLite Flatbuffers) |
Safety & Limitations
- May generate incorrect code or hallucinate crate APIs β review before production use.
- Not a replacement for the Rust compiler or clippy β always compile and test generated code.
- Optimized for tutoring, not production code generation at scale.
- Training data focuses on CLI/systems patterns; web framework coverage (Axum, Actix) is limited.
License
Apache 2.0 for the fine-tuned model; base model (Qwen/Qwen3-1.7B) license also applies.
Contact
- Maintainer: Sylvester Francis (@sylvester-francis)
- Repository: github.com/sylvester-francis/slm-rust-model
- Issues/feedback: Open a discussion on the model repo
- Downloads last month
- 5