PHI-2-STEM-261125

A Fine-tuned Phi-2 Model Optimized for STEM Knowledge

Science, Technology, Engineering, Mathematics, and Ethics

Model Description

PHI-2-STEM-261125 is a fine-tuned version of Microsoft's Phi-2 (2.78B parameters) specifically optimized for generating accurate and comprehensive explanations across multiple STEM domains. The model was trained using INT8 quantization to enable efficient training on consumer-grade GPUs.

Key Features

Multi-domain STEM expertise: Mathematics, Physics, Chemistry, Biology, and Ethics
Efficient training: INT8 quantization enables training on 4GB VRAM GPUs
High-quality curated dataset: 18 expert-written examples covering 11 specialized domains
Significant loss reduction: 26% improvement from initial to final loss

Model Details

Model Information

Property	Value
Model Name	PHI-2-STEM-261125
Base Model	microsoft/phi-2
Parameters	2.78 billion
Architecture	Transformer (decoder-only)
Precision	FP16 (Safetensors)
Training Date	November 26, 2025
License	MIT
DOI	10.57967/hf/7105

Author Information

Field	Value
Author	Francisco Molina Burgos
ORCID	0009-0008-6093-8267
Organization	Independent Researcher
Contact	[email protected]

Training Details

Training Configuration

Parameter	Value
Epochs	5
Batch Size	1 (per device)
Gradient Accumulation Steps	4
Effective Batch Size	4
Learning Rate	1e-5
Warmup Steps	2
Max Sequence Length	512 tokens
Precision	FP16 (Mixed Precision)
Quantization	INT8 (BitsAndBytes)
Gradient Checkpointing	Enabled

Hardware Specifications

Component	Specification
GPU	NVIDIA GeForce RTX 3050 (4GB VRAM)
CPU	Intel Core i7-12650H
RAM	16GB
Training Time	~30 minutes
VRAM Usage	~3.5 GB

Training Metrics

Metric	Value
Initial Loss	2.07
Final Loss (3 epochs)	1.65
Final Loss (5 epochs)	1.54
Average Loss	1.80
Total Loss Reduction	~26%

Loss Progression

Epoch 1: Loss ~2.07 (initial)
Epoch 2: Loss ~1.85
Epoch 3: Loss ~1.65
Epoch 4: Loss ~1.58
Epoch 5: Loss ~1.54 (final)

Dataset

Overview

The model was trained on a curated dataset of 18 expert-written examples covering 11 specialized STEM domains. Each example provides a concise, technically accurate explanation of fundamental concepts.

Domain Distribution

Domain	Examples	Topics Covered
Mathematics	3	Fundamental Theorem of Calculus, Riemann Hypothesis, Gödel's Incompleteness Theorems
Organic Chemistry	2	SN2 Reaction Mechanism, Molecular Orbital Theory (Benzene)
Quantum Chemistry	1	Density Functional Theory (DFT)
Quantum Physics	2	Quantum Entanglement, Heisenberg Uncertainty Principle
Physics	1	General Relativity (Einstein Field Equations)
Crystallography	1	X-ray Crystallography
Biochemistry	1	Enzyme Catalysis (Michaelis-Menten)
Pharmacology	1	Pharmacodynamics (Receptor Theory)
Ethics	3	Kant's Categorical Imperative, Bioethics, AI Ethics
Music Theory	2	Harmonic Analysis, Counterpoint
Art Theory	1	Golden Ratio

Dataset Characteristics

Format: Plain text explanations
Language: English (technical/scientific)
Average Length: ~100-150 tokens per example
Quality: Expert-curated, factually accurate
Coverage: Fundamental concepts across STEM disciplines

Usage

Installation

pip install transformers torch accelerate

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model_name = "Yatro/PHI-2-STEM-261125"
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

# Generate text
prompt = "Explain the Heisenberg Uncertainty Principle:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_new_tokens=200,
    temperature=0.7,
    top_p=0.95,
    do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Usage with INT8 Quantization (Low VRAM)

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

# INT8 configuration for GPUs with limited VRAM
quantization_config = BitsAndBytesConfig(
    load_in_8bit=True,
    llm_int8_threshold=6.0,
)

model = AutoModelForCausalLM.from_pretrained(
    "Yatro/PHI-2-STEM-261125",
    trust_remote_code=True,
    quantization_config=quantization_config,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Yatro/PHI-2-STEM-261125", trust_remote_code=True)

# Generate
prompt = "What is Gödel's Incompleteness Theorem?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Prompt Templates

For best results, use clear, direct prompts:

# Good prompts
prompts = [
    "Explain the Fundamental Theorem of Calculus:",
    "What is the SN2 reaction mechanism in organic chemistry?",
    "Describe quantum entanglement:",
    "What are the four principles of bioethics?",
    "Explain Kant's Categorical Imperative:",
]

# Structured format (also works well)
prompt = """Question: What is the Heisenberg Uncertainty Principle?
Answer:"""

Example Outputs

Quantum Physics

Prompt: "Explain quantum entanglement:"

Output:

Quantum entanglement describes correlations between particles that persist regardless of spatial separation. When particles are entangled, measuring one particle's state instantly determines the other's state. This phenomenon, verified experimentally through Bell inequality violations, challenges classical notions of locality and has applications in quantum computing and cryptography.

Organic Chemistry

Prompt: "What is the SN2 mechanism?"

Output:

The SN2 (bimolecular nucleophilic substitution) reaction proceeds through a single concerted step. The nucleophile attacks the electrophilic carbon from the backside, opposite to the leaving group. This results in inversion of stereochemistry (Walden inversion). The reaction is favored with primary substrates, good leaving groups, strong nucleophiles, and polar aprotic solvents.

Ethics

Prompt: "What is Kant's Categorical Imperative?"

Output:

Kant's Categorical Imperative is a fundamental principle in deontological ethics. It states: "Act only according to that maxim whereby you can at the same time will that it should become a universal law." This means moral actions must be universalizable - if everyone acted the same way, the action should remain coherent and beneficial. It emphasizes duty and rational moral principles over consequences.

Intended Use

Primary Use Cases

Educational Content Generation: Creating explanations of STEM concepts for learning materials
Research Assistance: Generating initial drafts of scientific explanations
Tutoring Systems: Providing explanations in AI-assisted learning platforms
Knowledge Retrieval: Answering questions about fundamental STEM concepts
Content Augmentation: Enhancing educational content with accurate explanations

Target Users

Educators and teachers
Students (undergraduate and graduate level)
Science communicators
EdTech developers
Researchers exploring LLM capabilities in STEM

Limitations

Known Limitations

Small Training Dataset: Only 18 examples, limiting coverage of STEM topics
Domain Specificity: Best performance on topics similar to training data
No Real-time Information: Knowledge cutoff based on base model (Phi-2)
Mathematical Reasoning: May struggle with complex mathematical derivations
Hallucination Risk: May generate plausible-sounding but incorrect information
Language: English only

Out-of-Scope Use Cases

Medical diagnosis or treatment recommendations
Legal advice
Financial decisions
Safety-critical applications
Generating content presented as human-written without disclosure

Recommendations

Always verify generated content against authoritative sources
Use as a starting point, not as definitive truth
Human review required for any published or educational content
Not suitable for generating content on topics outside training domains

Ethical Considerations

Bias and Fairness

The model inherits biases from the base Phi-2 model and training data
Training data reflects Western academic perspectives on STEM
Limited representation of non-Western scientific traditions

Environmental Impact

Training was performed on consumer hardware (RTX 3050)
Estimated carbon footprint: ~0.5 kg CO2 (30 minutes on 75W GPU)
INT8 quantization reduced computational requirements significantly

Transparency

Full training code and data are documented
Model weights are openly available
Limitations are clearly stated

Technical Specifications

Model Architecture

PHI-2-STEM-261125
├── Architecture: Transformer (decoder-only)
├── Hidden Size: 2560
├── Intermediate Size: 10240
├── Num Attention Heads: 32
├── Num Hidden Layers: 32
├── Vocab Size: 51200
├── Max Position Embeddings: 2048
├── Rotary Embedding Dimension: 32
└── Activation Function: GELU

File Structure

PHI-2-STEM-261125/
├── config.json              # Model configuration
├── model.safetensors        # Model weights (F16)
├── tokenizer.json           # Tokenizer vocabulary
├── tokenizer_config.json    # Tokenizer configuration
├── special_tokens_map.json  # Special tokens mapping
└── README.md                # This model card

Dependencies

transformers>=4.35.0
torch>=2.0.0
accelerate>=0.24.0
bitsandbytes>=0.41.0  # For INT8 quantization
safetensors>=0.4.0

Evaluation

Training Evaluation

Metric	Value	Notes
Final Loss	1.54	After 5 epochs
Loss Reduction	26%	From initial 2.07
Convergence	Yes	Consistent decrease

Qualitative Evaluation

The model was evaluated on:

Factual Accuracy: High for trained domains
Coherence: Strong sentence-level coherence
Relevance: Good adherence to prompts
Completeness: Adequate coverage of key concepts

Recommended Benchmarks

For comprehensive evaluation, consider:

Benchmark	Purpose	Expected Performance
MMLU (STEM subset)	Multi-task knowledge	Improved on base
GSM8K	Mathematical reasoning	Baseline
ARC Challenge	Scientific reasoning	Improved
SciQ	Science questions	Improved

Citation

BibTeX

@misc{molina_burgos_2025,
    author       = {Molina Burgos, Francisco},
    title        = {{PHI-2-STEM-261125} (Revision 54c4d49)},
    year         = 2025,
    url          = {https://huggingface.co/Yatro/PHI-2-STEM-261125},
    doi          = {10.57967/hf/7105},
    publisher    = {Hugging Face}
}

APA

Molina Burgos, F. (2025). PHI-2-STEM-261125 (Version 54c4d49) [Large language model]. Hugging Face. https://doi.org/10.57967/hf/7105

Related Work

Base Model

Phi-2: microsoft/phi-2
- 2.7B parameter model trained on synthetic and web data
- Strong performance on reasoning benchmarks

Similar Models

Related Research

Gunasekar, S., et al. (2023). "Textbooks Are All You Need"
Li, Y., et al. (2023). "Phi-1.5: Training LLMs with Synthetic Data"

Acknowledgments

Microsoft Research for the Phi-2 base model
Hugging Face for the transformers library and model hosting
BitsAndBytes team for efficient INT8 quantization
The open-source ML community for tools and inspiration

Version History

Version	Date	Changes
1.0.0	2025-11-26	Initial release (5 epochs, loss 1.54)

Contact & Support

Issues: GitHub Issues
Email: [email protected]
HuggingFace: Yatro

Made with dedication for the advancement of AI in STEM education

Licensed under MIT - Free to use, modify, and distribute

Downloads last month: 4

Safetensors

Model size

3B params

Tensor type

F16

Model tree for Yatro/PHI-2-STEM-261125

Base model

microsoft/phi-2

Adapter

(951)

this model

Evaluation results

Final Training Loss
self-reported

1.540
Average Training Loss
self-reported

1.800
Initial Loss
self-reported

2.070
Loss Reduction
self-reported

26%