zero0303
/

qwen3-tts-ljspeech-finetuned

text-generation

Model card Files Files and versions

Qwen3-TTS Fine-tuned on LJSpeech

This model is a fine-tuned version of Qwen/Qwen3-TTS-12Hz-1.7B-Base trained on the LJSpeech dataset.

Model Description

Base Model: Qwen3-TTS-12Hz-1.7B-Base
Training Data: LJSpeech-1.1 (200 samples subset)
Voice: Linda Johnson (female, American English)
Training: 3 epochs, loss reduced from 20.4 to 10.7

Voice Characteristics

The model produces speech in the voice of Linda Johnson, featuring:

Clear, professional female voice
American English accent
Natural reading style (audiobook quality)
Consistent tone and pacing

Use Cases

Audiobook narration - Professional reading voice for long-form content
Virtual assistants - Clear, friendly voice for AI applications
Accessibility tools - Text-to-speech for visually impaired users
Content creation - Voiceovers for videos and presentations
Educational content - Clear pronunciation for learning materials

Training Details

Parameter	Value
Epochs	3
Batch Size	1 (gradient accumulation: 4)
Learning Rate	5e-6
Mixed Precision	bf16
Starting Loss	20.4
Final Loss	~10.7

License and Attribution

Training Data: LJSpeech dataset (Public Domain)
Base Model: Qwen3-TTS (Apache 2.0)
This Fine-tuned Model: Apache 2.0

Credits

Original recordings by Linda Johnson
LJSpeech dataset by Keith Ito
Base model by Qwen Team

Downloads last month: 10

Model tree for zero0303/qwen3-tts-ljspeech-finetuned

Base model

Qwen/Qwen3-TTS-12Hz-1.7B-Base

Finetuned

(4)

this model

Dataset used to train zero0303/qwen3-tts-ljspeech-finetuned