A clearer demo for TADA (now multilingual) ππ
I improved the public demo for TADA β a generative framework for speech modeling via textβacoustic dual alignment.
TADA models speech as a joint sequence of text tokens and acoustic tokens, using a transformer backbone to keep text and audio synchronized during generation.
The original demo already exposed these mechanisms, but the workflow made the pipeline hard to understand.
This updated demo makes the process clearer:
β’ load the model β’ prepare a reference voice (optionally with transcript or Whisper auto-transcription) β’ generate speech conditioned on that reference
It also adds multilingual support.
Presets are included for a few languages, but the model supports more:
deepseek-ai/DeepSeek-OCR is out! π₯ my take β€΅οΈ > pretty insane it can parse and re-render charts in HTML > it uses CLIP and SAM features concatenated, so better grounding > very efficient per vision tokens/performance ratio > covers 100 languages
IBM just released small swiss army knife for the document models: granite-docling-258M on Hugging Face π₯
> not only a document converter but also can do document question answering, understand multiple languages π€― > best part: released with Apache 2.0 license π use it with your commercial projects! > it supports transformers, vLLM and MLX from the get-go! π€ > built on SigLIP2 & granite-165M