Fine-tuned Gemma 7B for Customer Support

This model is a fine-tuned version of Google's Gemma 7B model, specifically optimized for customer support chatbot applications. It has been trained using LoRA (Low-Rank Adaptation) on the Bitext customer support dataset to provide helpful and accurate responses to customer inquiries.

Model Details

Model Description

This is a fine-tuned version of Google's Gemma 7B model that has been optimized for customer support tasks. The model uses LoRA (Low-Rank Adaptation) fine-tuning to efficiently adapt the base model for customer service scenarios while maintaining the original model's capabilities. It can handle various customer support queries including payment options, product information, troubleshooting, and general assistance.

  • Developed by: Dhruv-2902
  • Model type: Causal Language Model (Fine-tuned)
  • Language(s): English
  • Finetuned from model: google/gemma-7b
  • Fine-tuning method: LoRA (Low-Rank Adaptation)
  • Dataset: bitext/Bitext-customer-support-llm-chatbot-training-dataset

Model Sources

  • Repository: Dhruv-2902/fine-tuned-gemma7b-customer-support
  • Base Model: google/gemma-7b
  • Training Dataset: bitext/Bitext-customer-support-llm-chatbot-training-dataset

Uses

Direct Use

This model is designed for customer support chatbot applications. It can be used directly to:

  • Answer customer inquiries about products and services
  • Provide information about payment options and policies
  • Assist with troubleshooting common issues
  • Handle general customer service requests
  • Generate helpful and contextually appropriate responses in customer support scenarios

Downstream Use

The model can be integrated into:

  • Customer service platforms and chatbots
  • Help desk systems
  • E-commerce customer support tools
  • Automated customer service applications
  • Voice assistants for customer support

Out-of-Scope Use

This model should not be used for:

  • General-purpose text generation outside customer support contexts
  • Content that requires real-time information or recent events
  • Tasks requiring domain expertise beyond customer service
  • Generation of harmful, biased, or inappropriate content
  • Legal, medical, or financial advice

Bias, Risks, and Limitations

The model inherits limitations from the base Gemma 7B model and may exhibit:

  • Biases present in the training data
  • Potential for generating incorrect or inappropriate responses
  • Limitations in understanding complex or nuanced customer issues
  • Possible inconsistencies in response quality
  • Language limitations (primarily English-focused)

Recommendations

Users should:

  • Implement appropriate content filtering and monitoring
  • Provide human oversight for complex customer issues
  • Regularly evaluate model performance and update as needed
  • Be aware of potential biases and work to mitigate them
  • Test thoroughly before deploying in production environments

How to Get Started with the Model

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("Dhruv-2902/fine-tuned-gemma7b-customer-support")

# Load base model with quantization for efficiency
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

base_model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-7b",
    quantization_config=bnb_config,
    device_map="auto"
)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "Dhruv-2902/fine-tuned-gemma7b-customer-support")
model.eval()

# Generate response
def generate_response(prompt, max_new_tokens=256):
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=0.7,
            top_p=0.9
        )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example usage
response = generate_response("help me see your allowed payment options")
print(response)

Training Details

Training Data

The model was fine-tuned on the Bitext Customer Support LLM Chatbot Training Dataset. The dataset was preprocessed by combining instruction and response pairs into a single text field with the format:

Instruction: [customer query]
Response: [support response]

The dataset was split 80/20 for training and testing respectively.

Training Procedure

Preprocessing

  • Dataset loaded from Hugging Face Hub
  • Combined instruction and response fields into single "text" field
  • Applied 80/20 train/test split with seed=42
  • No additional preprocessing applied

Training Hyperparameters

  • Training regime: fp16 mixed precision
  • Learning rate: 5e-5
  • Training epochs: 1
  • Per device train batch size: 2
  • Gradient accumulation steps: 4
  • Effective batch size: 8
  • LoRA rank (r): 16
  • LoRA alpha: 32
  • LoRA dropout: 0.05
  • Target modules: ["q_proj", "v_proj"]
  • Quantization: 4-bit (NF4)
  • Save strategy: Every 300 steps
  • Logging steps: 50

LoRA Configuration

  • Rank: 16
  • Alpha: 32
  • Target modules: Query and Value projections
  • Dropout: 0.05
  • Bias: None
  • Task type: Causal Language Modeling

Evaluation

Testing Data

The model was evaluated on a held-out test set (20% of the original dataset) consisting of customer support instruction-response pairs from the Bitext dataset.

Metrics

Standard language modeling metrics were used during training, including:

  • Training loss monitoring
  • Perplexity evaluation on test set

Model Architecture and Technical Specifications

Model Architecture

  • Base Architecture: Gemma 7B (Transformer-based)
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Parameters: ~7B (base) + LoRA adapters
  • Quantization: 4-bit NF4 quantization supported

Compute Infrastructure

Hardware Requirements

  • Training: GPU with sufficient VRAM (16GB+ recommended)
  • Inference: Can run on T4 GPUs with 4-bit quantization
  • Memory: Reduced memory footprint due to LoRA fine-tuning

Software

  • Framework: PyTorch
  • Libraries:
    • transformers
    • peft (v0.15.2)
    • trl (SFTTrainer)
    • bitsandbytes (quantization)
    • datasets

Environmental Impact

The model uses LoRA fine-tuning which significantly reduces computational requirements compared to full fine-tuning:

  • Training time: Reduced due to LoRA efficiency
  • Hardware requirements: Lower VRAM requirements
  • Carbon footprint: Minimized through efficient training approach

Framework Versions

  • PEFT: 0.15.2
  • Transformers: Latest compatible version
  • PyTorch: Latest compatible version
  • BitsAndBytesConfig: For 4-bit quantization support

Citation

If you use this model, please cite:

BibTeX:

@misc{fine-tuned-gemma7b-customer-support,
  author = {Dhruv-2902},
  title = {Fine-tuned Gemma 7B for Customer Support},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/Dhruv-2902/fine-tuned-gemma7b-customer-support}
}

Also cite the base model:

@misc{gemma_2024,
  title={Gemma: Open Models Based on Gemini Research and Technology},
  author={Gemma Team},
  year={2024},
  publisher={Google}
}

Model Card Authors

Dhruv-2902

Model Card Contact

For questions or issues regarding this model, please contact through the Hugging Face model repository.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Dhruv-2902/fine-tuned-gemma7b-customer-support

Base model

google/gemma-7b
Adapter
(9195)
this model

Dataset used to train Dhruv-2902/fine-tuned-gemma7b-customer-support