PHI-2-STEM-261125
A Fine-tuned Phi-2 Model Optimized for STEM Knowledge
Science, Technology, Engineering, Mathematics, and Ethics
Model Description
PHI-2-STEM-261125 is a fine-tuned version of Microsoft's Phi-2 (2.78B parameters) specifically optimized for generating accurate and comprehensive explanations across multiple STEM domains. The model was trained using INT8 quantization to enable efficient training on consumer-grade GPUs.
Key Features
- Multi-domain STEM expertise: Mathematics, Physics, Chemistry, Biology, and Ethics
- Efficient training: INT8 quantization enables training on 4GB VRAM GPUs
- High-quality curated dataset: 18 expert-written examples covering 11 specialized domains
- Significant loss reduction: 26% improvement from initial to final loss
Model Details
Model Information
| Property | Value |
|---|---|
| Model Name | PHI-2-STEM-261125 |
| Base Model | microsoft/phi-2 |
| Parameters | 2.78 billion |
| Architecture | Transformer (decoder-only) |
| Precision | FP16 (Safetensors) |
| Training Date | November 26, 2025 |
| License | MIT |
| DOI | 10.57967/hf/7105 |
Author Information
| Field | Value |
|---|---|
| Author | Francisco Molina Burgos |
| ORCID | 0009-0008-6093-8267 |
| Organization | Independent Researcher |
| Contact | [email protected] |
Training Details
Training Configuration
| Parameter | Value |
|---|---|
| Epochs | 5 |
| Batch Size | 1 (per device) |
| Gradient Accumulation Steps | 4 |
| Effective Batch Size | 4 |
| Learning Rate | 1e-5 |
| Warmup Steps | 2 |
| Max Sequence Length | 512 tokens |
| Precision | FP16 (Mixed Precision) |
| Quantization | INT8 (BitsAndBytes) |
| Gradient Checkpointing | Enabled |
Hardware Specifications
| Component | Specification |
|---|---|
| GPU | NVIDIA GeForce RTX 3050 (4GB VRAM) |
| CPU | Intel Core i7-12650H |
| RAM | 16GB |
| Training Time | ~30 minutes |
| VRAM Usage | ~3.5 GB |
Training Metrics
| Metric | Value |
|---|---|
| Initial Loss | 2.07 |
| Final Loss (3 epochs) | 1.65 |
| Final Loss (5 epochs) | 1.54 |
| Average Loss | 1.80 |
| Total Loss Reduction | ~26% |
Loss Progression
Epoch 1: Loss ~2.07 (initial)
Epoch 2: Loss ~1.85
Epoch 3: Loss ~1.65
Epoch 4: Loss ~1.58
Epoch 5: Loss ~1.54 (final)
Dataset
Overview
The model was trained on a curated dataset of 18 expert-written examples covering 11 specialized STEM domains. Each example provides a concise, technically accurate explanation of fundamental concepts.
Domain Distribution
| Domain | Examples | Topics Covered |
|---|---|---|
| Mathematics | 3 | Fundamental Theorem of Calculus, Riemann Hypothesis, Gödel's Incompleteness Theorems |
| Organic Chemistry | 2 | SN2 Reaction Mechanism, Molecular Orbital Theory (Benzene) |
| Quantum Chemistry | 1 | Density Functional Theory (DFT) |
| Quantum Physics | 2 | Quantum Entanglement, Heisenberg Uncertainty Principle |
| Physics | 1 | General Relativity (Einstein Field Equations) |
| Crystallography | 1 | X-ray Crystallography |
| Biochemistry | 1 | Enzyme Catalysis (Michaelis-Menten) |
| Pharmacology | 1 | Pharmacodynamics (Receptor Theory) |
| Ethics | 3 | Kant's Categorical Imperative, Bioethics, AI Ethics |
| Music Theory | 2 | Harmonic Analysis, Counterpoint |
| Art Theory | 1 | Golden Ratio |
Dataset Characteristics
- Format: Plain text explanations
- Language: English (technical/scientific)
- Average Length: ~100-150 tokens per example
- Quality: Expert-curated, factually accurate
- Coverage: Fundamental concepts across STEM disciplines
Usage
Installation
pip install transformers torch accelerate
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model and tokenizer
model_name = "Yatro/PHI-2-STEM-261125"
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
# Generate text
prompt = "Explain the Heisenberg Uncertainty Principle:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=200,
temperature=0.7,
top_p=0.95,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Usage with INT8 Quantization (Low VRAM)
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
# INT8 configuration for GPUs with limited VRAM
quantization_config = BitsAndBytesConfig(
load_in_8bit=True,
llm_int8_threshold=6.0,
)
model = AutoModelForCausalLM.from_pretrained(
"Yatro/PHI-2-STEM-261125",
trust_remote_code=True,
quantization_config=quantization_config,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Yatro/PHI-2-STEM-261125", trust_remote_code=True)
# Generate
prompt = "What is Gödel's Incompleteness Theorem?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Prompt Templates
For best results, use clear, direct prompts:
# Good prompts
prompts = [
"Explain the Fundamental Theorem of Calculus:",
"What is the SN2 reaction mechanism in organic chemistry?",
"Describe quantum entanglement:",
"What are the four principles of bioethics?",
"Explain Kant's Categorical Imperative:",
]
# Structured format (also works well)
prompt = """Question: What is the Heisenberg Uncertainty Principle?
Answer:"""
Example Outputs
Quantum Physics
Prompt: "Explain quantum entanglement:"
Output:
Quantum entanglement describes correlations between particles that persist regardless of spatial separation. When particles are entangled, measuring one particle's state instantly determines the other's state. This phenomenon, verified experimentally through Bell inequality violations, challenges classical notions of locality and has applications in quantum computing and cryptography.
Organic Chemistry
Prompt: "What is the SN2 mechanism?"
Output:
The SN2 (bimolecular nucleophilic substitution) reaction proceeds through a single concerted step. The nucleophile attacks the electrophilic carbon from the backside, opposite to the leaving group. This results in inversion of stereochemistry (Walden inversion). The reaction is favored with primary substrates, good leaving groups, strong nucleophiles, and polar aprotic solvents.
Ethics
Prompt: "What is Kant's Categorical Imperative?"
Output:
Kant's Categorical Imperative is a fundamental principle in deontological ethics. It states: "Act only according to that maxim whereby you can at the same time will that it should become a universal law." This means moral actions must be universalizable - if everyone acted the same way, the action should remain coherent and beneficial. It emphasizes duty and rational moral principles over consequences.
Intended Use
Primary Use Cases
- Educational Content Generation: Creating explanations of STEM concepts for learning materials
- Research Assistance: Generating initial drafts of scientific explanations
- Tutoring Systems: Providing explanations in AI-assisted learning platforms
- Knowledge Retrieval: Answering questions about fundamental STEM concepts
- Content Augmentation: Enhancing educational content with accurate explanations
Target Users
- Educators and teachers
- Students (undergraduate and graduate level)
- Science communicators
- EdTech developers
- Researchers exploring LLM capabilities in STEM
Limitations
Known Limitations
- Small Training Dataset: Only 18 examples, limiting coverage of STEM topics
- Domain Specificity: Best performance on topics similar to training data
- No Real-time Information: Knowledge cutoff based on base model (Phi-2)
- Mathematical Reasoning: May struggle with complex mathematical derivations
- Hallucination Risk: May generate plausible-sounding but incorrect information
- Language: English only
Out-of-Scope Use Cases
- Medical diagnosis or treatment recommendations
- Legal advice
- Financial decisions
- Safety-critical applications
- Generating content presented as human-written without disclosure
Recommendations
- Always verify generated content against authoritative sources
- Use as a starting point, not as definitive truth
- Human review required for any published or educational content
- Not suitable for generating content on topics outside training domains
Ethical Considerations
Bias and Fairness
- The model inherits biases from the base Phi-2 model and training data
- Training data reflects Western academic perspectives on STEM
- Limited representation of non-Western scientific traditions
Environmental Impact
- Training was performed on consumer hardware (RTX 3050)
- Estimated carbon footprint: ~0.5 kg CO2 (30 minutes on 75W GPU)
- INT8 quantization reduced computational requirements significantly
Transparency
- Full training code and data are documented
- Model weights are openly available
- Limitations are clearly stated
Technical Specifications
Model Architecture
PHI-2-STEM-261125
├── Architecture: Transformer (decoder-only)
├── Hidden Size: 2560
├── Intermediate Size: 10240
├── Num Attention Heads: 32
├── Num Hidden Layers: 32
├── Vocab Size: 51200
├── Max Position Embeddings: 2048
├── Rotary Embedding Dimension: 32
└── Activation Function: GELU
File Structure
PHI-2-STEM-261125/
├── config.json # Model configuration
├── model.safetensors # Model weights (F16)
├── tokenizer.json # Tokenizer vocabulary
├── tokenizer_config.json # Tokenizer configuration
├── special_tokens_map.json # Special tokens mapping
└── README.md # This model card
Dependencies
transformers>=4.35.0
torch>=2.0.0
accelerate>=0.24.0
bitsandbytes>=0.41.0 # For INT8 quantization
safetensors>=0.4.0
Evaluation
Training Evaluation
| Metric | Value | Notes |
|---|---|---|
| Final Loss | 1.54 | After 5 epochs |
| Loss Reduction | 26% | From initial 2.07 |
| Convergence | Yes | Consistent decrease |
Qualitative Evaluation
The model was evaluated on:
- Factual Accuracy: High for trained domains
- Coherence: Strong sentence-level coherence
- Relevance: Good adherence to prompts
- Completeness: Adequate coverage of key concepts
Recommended Benchmarks
For comprehensive evaluation, consider:
| Benchmark | Purpose | Expected Performance |
|---|---|---|
| MMLU (STEM subset) | Multi-task knowledge | Improved on base |
| GSM8K | Mathematical reasoning | Baseline |
| ARC Challenge | Scientific reasoning | Improved |
| SciQ | Science questions | Improved |
Citation
BibTeX
@misc{molina_burgos_2025,
author = {Molina Burgos, Francisco},
title = {{PHI-2-STEM-261125} (Revision 54c4d49)},
year = 2025,
url = {https://huggingface.co/Yatro/PHI-2-STEM-261125},
doi = {10.57967/hf/7105},
publisher = {Hugging Face}
}
APA
Molina Burgos, F. (2025). PHI-2-STEM-261125 (Version 54c4d49) [Large language model]. Hugging Face. https://doi.org/10.57967/hf/7105
Related Work
Base Model
- Phi-2: microsoft/phi-2
- 2.7B parameter model trained on synthetic and web data
- Strong performance on reasoning benchmarks
Similar Models
Related Research
- Gunasekar, S., et al. (2023). "Textbooks Are All You Need"
- Li, Y., et al. (2023). "Phi-1.5: Training LLMs with Synthetic Data"
Acknowledgments
- Microsoft Research for the Phi-2 base model
- Hugging Face for the transformers library and model hosting
- BitsAndBytes team for efficient INT8 quantization
- The open-source ML community for tools and inspiration
Version History
| Version | Date | Changes |
|---|---|---|
| 1.0.0 | 2025-11-26 | Initial release (5 epochs, loss 1.54) |
Contact & Support
- Issues: GitHub Issues
- Email: [email protected]
- HuggingFace: Yatro
Made with dedication for the advancement of AI in STEM education
Licensed under MIT - Free to use, modify, and distribute
- Downloads last month
- 4
Model tree for Yatro/PHI-2-STEM-261125
Base model
microsoft/phi-2Evaluation results
- Final Training Lossself-reported1.540
- Average Training Lossself-reported1.800
- Initial Lossself-reported2.070
- Loss Reductionself-reported26%