You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Indic Speaker Embedding Model (Fine-tuned)

Fine-tuned speaker embedding model for Indian languages, based on pyannote wespeaker-voxceleb-resnet34-LM.

Model Description

This model was fine-tuned on 112K+ audio samples from:

  • IndicVoices: 22 Indian languages, massive speaker diversity
  • Kathbath: 12 Indian languages

Training Details

  • Base Model: pyannote/wespeaker-voxceleb-resnet34-LM
  • Embedding Dimension: 256
  • Training Samples: 84,741
  • Validation Samples: 17,161
  • Held-out for EER: 10,317
  • Total Speakers: 3,975 (training) + 442 (held-out)

Training Configuration

  • Phase 1: 5 epochs with frozen backbone (head only)
  • Phase 2: 15 epochs full fine-tuning
  • Augmentations: 13 types (noise, reverb, pitch shift, etc.)
  • Label smoothing: 0.1
  • Dropout: 0.3

Results

Metric Value
Best Val Accuracy 91.4%
Best EER 4.18%

Usage

import torch
from pyannote.audio import Model

# Load base model
model = Model.from_pretrained("pyannote/wespeaker-voxceleb-resnet34-LM")

# Load fine-tuned weights
checkpoint = torch.load("checkpoint.pt")
# Note: This checkpoint includes a classification head for Indian languages

Intended Use

  • Speaker diarization for Indian language audio
  • Speaker verification/identification
  • Bengali speaker diarization (DLSPRINT challenge)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for smam/indic-speaker-embedding-finetuned

Finetuned
(2)
this model

Datasets used to train smam/indic-speaker-embedding-finetuned