Indic Speaker Embedding Model (Fine-tuned)
Fine-tuned speaker embedding model for Indian languages, based on pyannote wespeaker-voxceleb-resnet34-LM.
Model Description
This model was fine-tuned on 112K+ audio samples from:
- IndicVoices: 22 Indian languages, massive speaker diversity
- Kathbath: 12 Indian languages
Training Details
- Base Model: pyannote/wespeaker-voxceleb-resnet34-LM
- Embedding Dimension: 256
- Training Samples: 84,741
- Validation Samples: 17,161
- Held-out for EER: 10,317
- Total Speakers: 3,975 (training) + 442 (held-out)
Training Configuration
- Phase 1: 5 epochs with frozen backbone (head only)
- Phase 2: 15 epochs full fine-tuning
- Augmentations: 13 types (noise, reverb, pitch shift, etc.)
- Label smoothing: 0.1
- Dropout: 0.3
Results
| Metric | Value |
|---|---|
| Best Val Accuracy | 91.4% |
| Best EER | 4.18% |
Usage
import torch
from pyannote.audio import Model
# Load base model
model = Model.from_pretrained("pyannote/wespeaker-voxceleb-resnet34-LM")
# Load fine-tuned weights
checkpoint = torch.load("checkpoint.pt")
# Note: This checkpoint includes a classification head for Indian languages
Intended Use
- Speaker diarization for Indian language audio
- Speaker verification/identification
- Bengali speaker diarization (DLSPRINT challenge)
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for smam/indic-speaker-embedding-finetuned
Base model
pyannote/wespeaker-voxceleb-resnet34-LM