TestMaster-7B-GGUF: Resource-Efficient Polyglot Unit Test Generator
TestMaster-7B is a specialized Large Language Model (LLM) fine-tuned for generating robust, industrial-grade unit tests across multiple programming languages. Built upon Qwen2.5-Coder-7B-Instruct and fine-tuned using Unsloth, this model is optimized for logic reasoning, edge-case detection, and mock architecture simulation.
This repository contains the 4-bit Quantized (GGUF - Q4_K_M) version, designed to run efficiently on consumer hardware (e.g., NVIDIA RTX 3070, 8GB VRAM) with minimal performance loss compared to the FP16 baseline.
π Key Features
- Resource Efficient: Runs on ~4.5 GB of VRAM/RAM.
- Polyglot Capabilities: Master-level performance in C, Java, Go, Python, C++, JavaScript, and even ROS2.
- Advanced Reasoning: Capable of handling complex scenarios like:
- Memory Safety (Null pointer checks in C).
- Concurrency (Goroutines/WaitGroups in Go).
- Reflection & Retry Logic (Java).
- Async/Promise Mocking (JavaScript/Jest).
- ASTER Methodology: Trained with principles inspired by Automated Software Testing & Error Remediation, focusing on compilability and behavioral correctness.
π Performance Evaluation
The model has been rigorously tested across various programming languages and complex testing scenarios. Below is the updated summary of its performance:
| Language | Test Category | Difficulty | Success Rate | Key Observation |
|---|---|---|---|---|
| Python | Mocking & AsyncIO | π₯π₯π₯ | 100% | Flawless usage of AsyncMock, IsolatedAsyncioTestCase and assert_awaited. |
| Java | JUnit 5 & Mockito | π₯π₯ | 100% | Correctly migrated to JUnit 5 (ExtendWith); perfect Negative Testing & Verification logic. |
| JavaScript | Jest & API Mocking | π₯π₯ | 100% | Proactively used axios-mock-adapter instead of manual mocks; clean async/await flow. |
| C# | xUnit & Moq | π₯π₯ | 100% | Clean "Arrange-Act-Assert" structure; correct usage of It.IsAny and Attribute injection. |
| Go | Interface Mocking | π₯π₯π₯ | 95% | Correct usage of testify/mock embedding; handled struct-interface relationships well. |
| C++ | Google Test/Mock | βοΈ | 98% | Updated to modern MOCK_METHOD syntax; correctly managed memory (pointers) in SetUp/TearDown. |
| Rust | Traits & Mockall | π₯π₯π₯ | 95% | Successfully navigated ownership rules; correct usage of Box<dyn> and #[automock]. |
| PHP | Backend Testing | π₯π₯ | 100% | Chose industry-standard Mockery library over basic PHPUnit methods for better readability. |
| C | Memory Safety | π₯π₯ | 100% | Proactively prevents SegFaults using sizeof and null checks. |
π Click to Show All Training Data (Log)
| Step | Loss |
|---|---|
| 5 | 0.850300 |
| 10 | 0.857600 |
| 15 | 0.889200 |
| 20 | 0.849300 |
| 25 | 0.845600 |
| 30 | 0.801700 |
| 35 | 0.776000 |
| 40 | 0.811000 |
| 45 | 0.744900 |
| 50 | 0.742400 |
| 55 | 0.701500 |
| 60 | 0.728600 |
| 65 | 0.695400 |
| 70 | 0.631200 |
| 75 | 0.668300 |
| 80 | 0.602800 |
| 85 | 0.628700 |
| 90 | 0.657700 |
| 95 | 0.595500 |
| 100 | 0.592900 |
| 105 | 0.634900 |
| 110 | 0.665500 |
| 115 | 0.616500 |
| 120 | 0.615300 |
| 125 | 0.584500 |
| 130 | 0.622000 |
| 135 | 0.606700 |
| 140 | 0.577100 |
| 145 | 0.611500 |
| 150 | 0.579300 |
| 155 | 0.562100 |
| 160 | 0.595000 |
| 165 | 0.599200 |
| 170 | 0.547900 |
| 175 | 0.598000 |
| 180 | 0.566500 |
| 185 | 0.576300 |
| 190 | 0.543700 |
| 195 | 0.533000 |
| 200 | 0.575800 |
| 205 | 0.585400 |
| 210 | 0.555400 |
| 215 | 0.599300 |
| 220 | 0.528900 |
| 225 | 0.560100 |
| 230 | 0.579700 |
| 235 | 0.557400 |
| 240 | 0.518200 |
| 245 | 0.541800 |
| 250 | 0.534200 |
| 255 | 0.538100 |
| 260 | 0.570400 |
| 265 | 0.518400 |
| 270 | 0.527300 |
| 275 | 0.550300 |
| 280 | 0.536100 |
| 285 | 0.550300 |
| 290 | 0.551200 |
| 295 | 0.558500 |
| 300 | 0.529000 |
| 305 | 0.567800 |
| 310 | 0.530300 |
| 315 | 0.545600 |
| 320 | 0.529100 |
| 325 | 0.511600 |
| 330 | 0.538000 |
| 335 | 0.569400 |
| 340 | 0.524100 |
| 345 | 0.535100 |
| 350 | 0.573300 |
| 355 | 0.544000 |
| 360 | 0.547900 |
| 365 | 0.544900 |
| 370 | 0.533300 |
| 375 | 0.540300 |
| 380 | 0.543000 |
| 385 | 0.563800 |
| 390 | 0.514600 |
| 395 | 0.549900 |
| 400 | 0.562600 |
| 405 | 0.539200 |
| 410 | 0.580100 |
| 415 | 0.557300 |
| 420 | 0.555000 |
| 425 | 0.525200 |
π» Usage
Prompt Format (ChatML)
This model uses the ChatML template. Strict adherence to this format is recommended for optimal results.
<|im_start|>system
You are an expert software tester. Your goal is to write a comprehensive unit test.<|im_end|>
<|im_start|>user
Instruction:
Write a unit test for the following [Language] code...
Input:
[Source Code Here]
<|im_end|>
<|im_start|>assistant
Running with LM Studio / Ollama
- Download the
.gguffile from this repository. - Load it into your preferred GGUF runner (LM Studio, Ollama, etc.).
- System Prompt: Set the system prompt to: "You are an expert programmer and unit test generator."
- Context Window: Recommended set to
4096or8192.
Running with Python (llama-cpp-python)
from llama_cpp import Llama
llm = Llama(
model_path="./unit-test-model-q4_k_m.gguf",
n_ctx=4096,
n_gpu_layers=-1, # Offload all layers to GPU
verbose=False
)
prompt = """<|im_start|>system
You are an expert unit test generator.<|im_end|>
<|im_start|>user
Write a Python unit test for a function that divides two numbers.<|im_end|>
<|im_start|>assistant"""
output = llm(
prompt,
max_tokens=1024,
stop=["<|im_end|>"],
echo=False
)
print(output['choices'][0]['text'])
β οΈ Limitations & Bias Python Imports: In very complex Python scenarios involving obscure libraries, the model might occasionally miss an import statement or hallucinate a mock path.
Context Window: While optimized, extremely large source code files might exceed the context window. It is recommended to test modular functions or classes.
Self-Healing: For dynamic languages (Python, JS), we recommend using this model in a "Self-Healing" loop (Execution -> Error Capture -> Repair) for 100% reliability.
π License This model is a fine-tune of Qwen2.5-Coder and is licensed under Apache 2.0.
π€ Acknowledgments Fine-tuned using Unsloth (2x faster training).
Base model by Qwen Team.
Dataset curated for ASTER (Automated Software Testing & Error Remediation) research.
- Downloads last month
- 65
4-bit