Qwen3-TTS Fine-tuned on LJSpeech

This model is a fine-tuned version of Qwen/Qwen3-TTS-12Hz-1.7B-Base trained on the LJSpeech dataset.

Model Description

  • Base Model: Qwen3-TTS-12Hz-1.7B-Base
  • Training Data: LJSpeech-1.1 (200 samples subset)
  • Voice: Linda Johnson (female, American English)
  • Training: 3 epochs, loss reduced from 20.4 to 10.7

Voice Characteristics

The model produces speech in the voice of Linda Johnson, featuring:

  • Clear, professional female voice
  • American English accent
  • Natural reading style (audiobook quality)
  • Consistent tone and pacing

Use Cases

  • Audiobook narration - Professional reading voice for long-form content
  • Virtual assistants - Clear, friendly voice for AI applications
  • Accessibility tools - Text-to-speech for visually impaired users
  • Content creation - Voiceovers for videos and presentations
  • Educational content - Clear pronunciation for learning materials

Training Details

Parameter Value
Epochs 3
Batch Size 1 (gradient accumulation: 4)
Learning Rate 5e-6
Mixed Precision bf16
Starting Loss 20.4
Final Loss ~10.7

License and Attribution

  • Training Data: LJSpeech dataset (Public Domain)
  • Base Model: Qwen3-TTS (Apache 2.0)
  • This Fine-tuned Model: Apache 2.0

Credits

  • Original recordings by Linda Johnson
  • LJSpeech dataset by Keith Ito
  • Base model by Qwen Team
Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zero0303/qwen3-tts-ljspeech-finetuned

Finetuned
(4)
this model

Dataset used to train zero0303/qwen3-tts-ljspeech-finetuned