Qwen3-TTS Fine-tuned on LJSpeech
This model is a fine-tuned version of Qwen/Qwen3-TTS-12Hz-1.7B-Base trained on the LJSpeech dataset.
Model Description
- Base Model: Qwen3-TTS-12Hz-1.7B-Base
- Training Data: LJSpeech-1.1 (200 samples subset)
- Voice: Linda Johnson (female, American English)
- Training: 3 epochs, loss reduced from 20.4 to 10.7
Voice Characteristics
The model produces speech in the voice of Linda Johnson, featuring:
- Clear, professional female voice
- American English accent
- Natural reading style (audiobook quality)
- Consistent tone and pacing
Use Cases
- Audiobook narration - Professional reading voice for long-form content
- Virtual assistants - Clear, friendly voice for AI applications
- Accessibility tools - Text-to-speech for visually impaired users
- Content creation - Voiceovers for videos and presentations
- Educational content - Clear pronunciation for learning materials
Training Details
| Parameter | Value |
|---|---|
| Epochs | 3 |
| Batch Size | 1 (gradient accumulation: 4) |
| Learning Rate | 5e-6 |
| Mixed Precision | bf16 |
| Starting Loss | 20.4 |
| Final Loss | ~10.7 |
License and Attribution
- Training Data: LJSpeech dataset (Public Domain)
- Base Model: Qwen3-TTS (Apache 2.0)
- This Fine-tuned Model: Apache 2.0
Credits
- Downloads last month
- 10
Model tree for zero0303/qwen3-tts-ljspeech-finetuned
Base model
Qwen/Qwen3-TTS-12Hz-1.7B-Base