Fine-tuned Gemma 7B for Customer Support

This model is a fine-tuned version of Google's Gemma 7B model, specifically optimized for customer support chatbot applications. It has been trained using LoRA (Low-Rank Adaptation) on the Bitext customer support dataset to provide helpful and accurate responses to customer inquiries.

Model Details

Model Description

This is a fine-tuned version of Google's Gemma 7B model that has been optimized for customer support tasks. The model uses LoRA (Low-Rank Adaptation) fine-tuning to efficiently adapt the base model for customer service scenarios while maintaining the original model's capabilities. It can handle various customer support queries including payment options, product information, troubleshooting, and general assistance.

Developed by: Dhruv-2902
Model type: Causal Language Model (Fine-tuned)
Language(s): English
Finetuned from model: google/gemma-7b
Fine-tuning method: LoRA (Low-Rank Adaptation)
Dataset: bitext/Bitext-customer-support-llm-chatbot-training-dataset

Model Sources

Repository: Dhruv-2902/fine-tuned-gemma7b-customer-support
Base Model: google/gemma-7b
Training Dataset: bitext/Bitext-customer-support-llm-chatbot-training-dataset

Uses

Direct Use

This model is designed for customer support chatbot applications. It can be used directly to:

Answer customer inquiries about products and services
Provide information about payment options and policies
Assist with troubleshooting common issues
Handle general customer service requests
Generate helpful and contextually appropriate responses in customer support scenarios

Downstream Use

The model can be integrated into:

Customer service platforms and chatbots
Help desk systems
E-commerce customer support tools
Automated customer service applications
Voice assistants for customer support

Out-of-Scope Use

This model should not be used for:

General-purpose text generation outside customer support contexts
Content that requires real-time information or recent events
Tasks requiring domain expertise beyond customer service
Generation of harmful, biased, or inappropriate content
Legal, medical, or financial advice

Bias, Risks, and Limitations

The model inherits limitations from the base Gemma 7B model and may exhibit:

Biases present in the training data
Potential for generating incorrect or inappropriate responses
Limitations in understanding complex or nuanced customer issues
Possible inconsistencies in response quality
Language limitations (primarily English-focused)

Recommendations

Users should:

Implement appropriate content filtering and monitoring
Provide human oversight for complex customer issues
Regularly evaluate model performance and update as needed
Be aware of potential biases and work to mitigate them
Test thoroughly before deploying in production environments

How to Get Started with the Model

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("Dhruv-2902/fine-tuned-gemma7b-customer-support")

# Load base model with quantization for efficiency
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

base_model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-7b",
    quantization_config=bnb_config,
    device_map="auto"
)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "Dhruv-2902/fine-tuned-gemma7b-customer-support")
model.eval()

# Generate response
def generate_response(prompt, max_new_tokens=256):
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=0.7,
            top_p=0.9
        )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example usage
response = generate_response("help me see your allowed payment options")
print(response)

Training Details

Training Data

The model was fine-tuned on the Bitext Customer Support LLM Chatbot Training Dataset. The dataset was preprocessed by combining instruction and response pairs into a single text field with the format:

Instruction: [customer query]
Response: [support response]

The dataset was split 80/20 for training and testing respectively.

Training Procedure

Preprocessing

Dataset loaded from Hugging Face Hub
Combined instruction and response fields into single "text" field
Applied 80/20 train/test split with seed=42
No additional preprocessing applied

Training Hyperparameters

Training regime: fp16 mixed precision
Learning rate: 5e-5
Training epochs: 1
Per device train batch size: 2
Gradient accumulation steps: 4
Effective batch size: 8
LoRA rank (r): 16
LoRA alpha: 32
LoRA dropout: 0.05
Target modules: ["q_proj", "v_proj"]
Quantization: 4-bit (NF4)
Save strategy: Every 300 steps
Logging steps: 50

LoRA Configuration

Rank: 16
Alpha: 32
Target modules: Query and Value projections
Dropout: 0.05
Bias: None
Task type: Causal Language Modeling

Evaluation

Testing Data

The model was evaluated on a held-out test set (20% of the original dataset) consisting of customer support instruction-response pairs from the Bitext dataset.

Metrics

Standard language modeling metrics were used during training, including:

Training loss monitoring
Perplexity evaluation on test set

Model Architecture and Technical Specifications

Model Architecture

Base Architecture: Gemma 7B (Transformer-based)
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Parameters: ~7B (base) + LoRA adapters
Quantization: 4-bit NF4 quantization supported

Compute Infrastructure

Hardware Requirements

Training: GPU with sufficient VRAM (16GB+ recommended)
Inference: Can run on T4 GPUs with 4-bit quantization
Memory: Reduced memory footprint due to LoRA fine-tuning

Software

Framework: PyTorch
Libraries:
- transformers
- peft (v0.15.2)
- trl (SFTTrainer)
- bitsandbytes (quantization)
- datasets

Environmental Impact

The model uses LoRA fine-tuning which significantly reduces computational requirements compared to full fine-tuning:

Training time: Reduced due to LoRA efficiency
Hardware requirements: Lower VRAM requirements
Carbon footprint: Minimized through efficient training approach

Framework Versions

PEFT: 0.15.2
Transformers: Latest compatible version
PyTorch: Latest compatible version
BitsAndBytesConfig: For 4-bit quantization support

Citation

If you use this model, please cite:

BibTeX:

@misc{fine-tuned-gemma7b-customer-support,
  author = {Dhruv-2902},
  title = {Fine-tuned Gemma 7B for Customer Support},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/Dhruv-2902/fine-tuned-gemma7b-customer-support}
}

Also cite the base model:

@misc{gemma_2024,
  title={Gemma: Open Models Based on Gemini Research and Technology},
  author={Gemma Team},
  year={2024},
  publisher={Google}
}

Model Card Authors

Dhruv-2902

Model Card Contact

For questions or issues regarding this model, please contact through the Hugging Face model repository.

Downloads last month: -

Model tree for Dhruv-2902/fine-tuned-gemma7b-customer-support

Base model

google/gemma-7b

Adapter

(9195)

this model

Dhruv-2902
/

fine-tuned-gemma7b-customer-support