Field-adaptive-query-generator
Model Details
Model Description
A fine-tuned text generation model for query generation from presentation template metadata. This model uses LoRA adapters to efficiently fine-tune Google Gemma-3-4B for generating diverse and relevant search queries as part of the Field-Adaptive Dense Retrieval framework.
Developed by: Mudasir Syed (mudasir13cs)
Model type: Causal Language Model with LoRA
Language(s) (NLP): English
License: Apache 2.0
Finetuned from model: unsloth/gemma-3-4b-it-unsloth-bnb-4bit
Paper: Field-Adaptive Dense Retrieval of Structured Documents
Model Sources
- Repository: https://github.com/mudasir13cs/hybrid-search
- Paper: https://www.dbpia.co.kr/journal/articleDetail?nodeId=NODE12352544
- Base Model: https://huggingface.co/unsloth/gemma-3-4b-it-unsloth-bnb-4bit
Uses
Direct Use
This model is designed for generating search queries from presentation template metadata including titles, descriptions, industries, categories, and tags. It serves as a key component in the Field-Adaptive Dense Retrieval system for structured documents.
Downstream Use
- Content generation systems
- SEO optimization tools
- Template recommendation engines
- Automated content creation
- Field-adaptive search query generation
- Dense retrieval systems for structured documents
- Query expansion and reformulation
Out-of-Scope Use
- Factual information generation
- Medical or legal advice
- Harmful content generation
- Tasks unrelated to presentation templates or structured document retrieval
Bias, Risks, and Limitations
- The model may generate biased or stereotypical content based on training data
- Generated content should be reviewed for accuracy and appropriateness
- Performance depends on input quality and relevance
- Model outputs are optimized for presentation template domain
How to Get Started with the Model
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load the model
model = AutoModelForCausalLM.from_pretrained("mudasir13cs/Field-adaptive-query-generator")
tokenizer = AutoTokenizer.from_pretrained("mudasir13cs/Field-adaptive-query-generator")
# Generate content
# Format prompt using Gemma chat template
input_text = """<start_of_turn>user
Generate 8 different search queries that users might use to find this presentation template:
Title: Modern Business Presentation
Description: This modern business presentation template features a minimalist design...
Industries: Business, Marketing
Categories: Corporate, Professional
Tags: Modern, Clean, Professional
<end_of_turn>
<start_of_turn>model
"""
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512, temperature=0.7, do_sample=True)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
Training Details
Training Data
- Dataset: Presentation template dataset with metadata
- Size: Custom dataset with template-query pairs
- Source: Curated presentation template collection from structured documents
- Domain: Presentation templates with field-adaptive metadata
Training Procedure
- Architecture: Google Gemma-3-4B with LoRA adapters
- Base Model: unsloth/gemma-3-4b-it-unsloth-bnb-4bit
- Loss Function: Cross-entropy loss
- Optimizer: AdamW
- Learning Rate: 2e-4
- Batch Size: 4
- Epochs: 3
- Framework: Unsloth for efficient fine-tuning
Training Hyperparameters
- Training regime: Supervised fine-tuning with LoRA (PEFT)
- LoRA Rank: 16
- LoRA Alpha: 32
- Hardware: GPU (NVIDIA)
- Training time: ~3 hours
- Fine-tuning method: Parameter-Efficient Fine-Tuning (PEFT)
Evaluation
Testing Data, Factors & Metrics
- Testing Data: Validation split from template dataset
- Factors: Content quality, relevance, diversity, field-adaptive retrieval performance
- Metrics:
- BLEU score
- ROUGE score
- Human evaluation scores
- Query relevance metrics
- Retrieval accuracy metrics
Results
- BLEU Score: ~0.75
- ROUGE Score: ~0.80
- Performance: Optimized for query generation quality in structured document retrieval
- Domain: High performance on presentation template metadata
Environmental Impact
- Hardware Type: NVIDIA GPU
- Hours used: ~3 hours
- Cloud Provider: Local/Cloud
- Carbon Emitted: Minimal (LoRA training with efficient Unsloth framework)
Technical Specifications
Model Architecture and Objective
- Base Architecture: Google Gemma-3-4B transformer decoder
- Adaptation: LoRA adapters for parameter-efficient fine-tuning
- Objective: Generate relevant search queries from template metadata for field-adaptive dense retrieval
- Input: Template metadata (title, description, industries, categories, tags)
- Output: Generated search queries for structured document retrieval
Compute Infrastructure
- Hardware: NVIDIA GPU
- Software: PyTorch, Transformers, PEFT, Unsloth
Citation
Paper:
@article{field_adaptive_dense_retrieval,
title={Field-Adaptive Dense Retrieval of Structured Documents},
author={Mudasir Syed},
journal={DBPIA},
year={2024},
url={https://www.dbpia.co.kr/journal/articleDetail?nodeId=NODE12352544}
}
Model:
@misc{field_adaptive_query_generator,
title={Field-adaptive-query-generator for Presentation Template Query Generation},
author={Mudasir Syed},
year={2024},
howpublished={Hugging Face},
url={https://huggingface.co/mudasir13cs/Field-adaptive-query-generator}
}
APA: Syed, M. (2024). Field-adaptive-query-generator for Presentation Template Query Generation. Hugging Face. https://huggingface.co/mudasir13cs/Field-adaptive-query-generator
Model Card Authors
Mudasir Syed (mudasir13cs)
Model Card Contact
- GitHub: https://github.com/mudasir13cs
- Hugging Face: https://huggingface.co/mudasir13cs
- LinkedIn: https://pk.linkedin.com/in/mudasir-sayed
Framework versions
- Transformers: 4.35.0+
- PEFT: 0.16.0+
- PyTorch: 2.0.0+
- Unsloth: Latest
Model tree for mudasir13cs/Field-adaptive-query-generator
Base model
google/gemma-3-4b-pt