Fine-tuned Gemma 7B for Customer Support
This model is a fine-tuned version of Google's Gemma 7B model, specifically optimized for customer support chatbot applications. It has been trained using LoRA (Low-Rank Adaptation) on the Bitext customer support dataset to provide helpful and accurate responses to customer inquiries.
Model Details
Model Description
This is a fine-tuned version of Google's Gemma 7B model that has been optimized for customer support tasks. The model uses LoRA (Low-Rank Adaptation) fine-tuning to efficiently adapt the base model for customer service scenarios while maintaining the original model's capabilities. It can handle various customer support queries including payment options, product information, troubleshooting, and general assistance.
- Developed by: Dhruv-2902
- Model type: Causal Language Model (Fine-tuned)
- Language(s): English
- Finetuned from model: google/gemma-7b
- Fine-tuning method: LoRA (Low-Rank Adaptation)
- Dataset: bitext/Bitext-customer-support-llm-chatbot-training-dataset
Model Sources
- Repository: Dhruv-2902/fine-tuned-gemma7b-customer-support
- Base Model: google/gemma-7b
- Training Dataset: bitext/Bitext-customer-support-llm-chatbot-training-dataset
Uses
Direct Use
This model is designed for customer support chatbot applications. It can be used directly to:
- Answer customer inquiries about products and services
- Provide information about payment options and policies
- Assist with troubleshooting common issues
- Handle general customer service requests
- Generate helpful and contextually appropriate responses in customer support scenarios
Downstream Use
The model can be integrated into:
- Customer service platforms and chatbots
- Help desk systems
- E-commerce customer support tools
- Automated customer service applications
- Voice assistants for customer support
Out-of-Scope Use
This model should not be used for:
- General-purpose text generation outside customer support contexts
- Content that requires real-time information or recent events
- Tasks requiring domain expertise beyond customer service
- Generation of harmful, biased, or inappropriate content
- Legal, medical, or financial advice
Bias, Risks, and Limitations
The model inherits limitations from the base Gemma 7B model and may exhibit:
- Biases present in the training data
- Potential for generating incorrect or inappropriate responses
- Limitations in understanding complex or nuanced customer issues
- Possible inconsistencies in response quality
- Language limitations (primarily English-focused)
Recommendations
Users should:
- Implement appropriate content filtering and monitoring
- Provide human oversight for complex customer issues
- Regularly evaluate model performance and update as needed
- Be aware of potential biases and work to mitigate them
- Test thoroughly before deploying in production environments
How to Get Started with the Model
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("Dhruv-2902/fine-tuned-gemma7b-customer-support")
# Load base model with quantization for efficiency
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
base_model = AutoModelForCausalLM.from_pretrained(
"google/gemma-7b",
quantization_config=bnb_config,
device_map="auto"
)
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "Dhruv-2902/fine-tuned-gemma7b-customer-support")
model.eval()
# Generate response
def generate_response(prompt, max_new_tokens=256):
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
temperature=0.7,
top_p=0.9
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
# Example usage
response = generate_response("help me see your allowed payment options")
print(response)
Training Details
Training Data
The model was fine-tuned on the Bitext Customer Support LLM Chatbot Training Dataset. The dataset was preprocessed by combining instruction and response pairs into a single text field with the format:
Instruction: [customer query]
Response: [support response]
The dataset was split 80/20 for training and testing respectively.
Training Procedure
Preprocessing
- Dataset loaded from Hugging Face Hub
- Combined instruction and response fields into single "text" field
- Applied 80/20 train/test split with seed=42
- No additional preprocessing applied
Training Hyperparameters
- Training regime: fp16 mixed precision
- Learning rate: 5e-5
- Training epochs: 1
- Per device train batch size: 2
- Gradient accumulation steps: 4
- Effective batch size: 8
- LoRA rank (r): 16
- LoRA alpha: 32
- LoRA dropout: 0.05
- Target modules: ["q_proj", "v_proj"]
- Quantization: 4-bit (NF4)
- Save strategy: Every 300 steps
- Logging steps: 50
LoRA Configuration
- Rank: 16
- Alpha: 32
- Target modules: Query and Value projections
- Dropout: 0.05
- Bias: None
- Task type: Causal Language Modeling
Evaluation
Testing Data
The model was evaluated on a held-out test set (20% of the original dataset) consisting of customer support instruction-response pairs from the Bitext dataset.
Metrics
Standard language modeling metrics were used during training, including:
- Training loss monitoring
- Perplexity evaluation on test set
Model Architecture and Technical Specifications
Model Architecture
- Base Architecture: Gemma 7B (Transformer-based)
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Parameters: ~7B (base) + LoRA adapters
- Quantization: 4-bit NF4 quantization supported
Compute Infrastructure
Hardware Requirements
- Training: GPU with sufficient VRAM (16GB+ recommended)
- Inference: Can run on T4 GPUs with 4-bit quantization
- Memory: Reduced memory footprint due to LoRA fine-tuning
Software
- Framework: PyTorch
- Libraries:
- transformers
- peft (v0.15.2)
- trl (SFTTrainer)
- bitsandbytes (quantization)
- datasets
Environmental Impact
The model uses LoRA fine-tuning which significantly reduces computational requirements compared to full fine-tuning:
- Training time: Reduced due to LoRA efficiency
- Hardware requirements: Lower VRAM requirements
- Carbon footprint: Minimized through efficient training approach
Framework Versions
- PEFT: 0.15.2
- Transformers: Latest compatible version
- PyTorch: Latest compatible version
- BitsAndBytesConfig: For 4-bit quantization support
Citation
If you use this model, please cite:
BibTeX:
@misc{fine-tuned-gemma7b-customer-support,
author = {Dhruv-2902},
title = {Fine-tuned Gemma 7B for Customer Support},
year = {2024},
publisher = {Hugging Face},
url = {https://huggingface.co/Dhruv-2902/fine-tuned-gemma7b-customer-support}
}
Also cite the base model:
@misc{gemma_2024,
title={Gemma: Open Models Based on Gemini Research and Technology},
author={Gemma Team},
year={2024},
publisher={Google}
}
Model Card Authors
Dhruv-2902
Model Card Contact
For questions or issues regarding this model, please contact through the Hugging Face model repository.
- Downloads last month
- -
Model tree for Dhruv-2902/fine-tuned-gemma7b-customer-support
Base model
google/gemma-7b