Top Contributors: Profile Followers

community

https://huggingface.co/spaces/mvaloatto/TCTF

Activity Feed

AI & ML interests

👥 User profiles with the most cumulative all-time followers at the end of each month

Recent Activity

akhaliq submitted a paper about 24 hours ago

Multimodal OCR: Parse Anything from Documents

akhaliq submitted a paper about 1 month ago

SE-Bench: Benchmarking Self-Evolution with Knowledge Internalization

akhaliq submitted a paper about 1 month ago

Visual Personalization Turing Test

View all activity

akhaliq

submitted a paper to Daily Papers about 24 hours ago

Multimodal OCR: Parse Anything from Documents

Paper • 2603.13032 • Published 4 days ago • 17

fffiloni

posted an update 4 days ago

Post

336

A clearer demo for TADA (now multilingual) 🔊🌍

I improved the public demo for TADA — a generative framework for speech modeling via text–acoustic dual alignment.

TADA models speech as a joint sequence of text tokens and acoustic tokens, using a transformer backbone to keep text and audio synchronized during generation.

The original demo already exposed these mechanisms, but the workflow made the pipeline hard to understand.

This updated demo makes the process clearer:

• load the model
• prepare a reference voice (optionally with transcript or Whisper auto-transcription)
• generate speech conditioned on that reference

It also adds multilingual support.

Presets are included for a few languages, but the model supports more:

English, French, Spanish, German, Arabic, Mandarin Chinese, Italian, Japanese, Polish, Portuguese

Feel free to try different voices, accents, or languages and see how the alignment behaves.

👉 fffiloni/tada-dual-alignment-tts-demo

Paper
TADA: A Generative Framework for Speech Modeling via Text-Acoustic Dual Alignment (2602.23068)

akhaliq

submitted 3 papers to Daily Papers about 1 month ago

submitted 2 papers to Daily Papers about 2 months ago

Motion 3-to-4: 3D Motion Reconstruction for 4D Synthesis

Paper • 2601.14253 • Published Jan 20 • 10

V-DPM: 4D Video Reconstruction with Dynamic Point Maps

Paper • 2601.09499 • Published Jan 14 • 9

akhaliq

submitted 4 papers to Daily Papers 2 months ago

UM-Text: A Unified Multimodal Model for Image Understanding

Paper • 2601.08321 • Published Jan 13 • 11

ResTok: Learning Hierarchical Residuals in 1D Visual Tokenizers for Autoregressive Image Generation

Paper • 2601.03955 • Published Jan 7 • 3

FlowBlending: Stage-Aware Multi-Model Sampling for Fast and High-Fidelity Video Generation

Paper • 2512.24724 • Published Dec 31, 2025 • 8

Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow

Paper • 2512.24766 • Published Dec 31, 2025 • 9

lllyasviel

submitted a paper to Daily Papers 3 months ago

Pretraining Frame Preservation in Autoregressive Video Memory Compression

Paper • 2512.23851 • Published Dec 29, 2025 • 25

akhaliq

submitted 3 papers to Daily Papers 3 months ago

What matters for Representation Alignment: Global Information or Spatial Structure?

Paper • 2512.10794 • Published Dec 11, 2025 • 9

Towards a Science of Scaling Agent Systems

Paper • 2512.08296 • Published Dec 9, 2025 • 16

ThreadWeaver: Adaptive Threading for Efficient Parallel Reasoning in Language Models

Paper • 2512.07843 • Published Nov 24, 2025 • 22

merve

posted an update 5 months ago

Post

9789

deepseek-ai/DeepSeek-OCR is out! 🔥 my take ⤵️
> pretty insane it can parse and re-render charts in HTML
> it uses CLIP and SAM features concatenated, so better grounding
> very efficient per vision tokens/performance ratio
> covers 100 languages

4 replies

akhaliq

authored a paper 5 months ago

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

Paper • 2510.08697 • Published Oct 9, 2025 • 39

merve

posted an update 6 months ago

Post

6936

large AI labs open-sourced a ton of models last week 🔥
here's few picks, find even more here merve/sep-16-releases-68d13ea4c547f02f95842f05 🤝
> IBM released a new Docling model with 258M params based on Granite (A2.0) 📝 ibm-granite/granite-docling-258M
> Xiaomi released 7B audio LM with base and instruct variants (MIT) XiaomiMiMo/mimo-audio-68cc7202692c27dae881cce0
> DecartAI released Lucy Edit, open Nano Banana 🍌 (NC) decart-ai/Lucy-Edit-Dev
> OpenGVLab released a family of agentic computer use models (3B/7B/32B) with the dataset 💻 OpenGVLab/scalecua-68c912cf56f7ff4c8e034003
> Meituan Longcat released thinking version of LongCat-Flash 💭 meituan-longcat/LongCat-Flash-Thinking

2 replies

merve

posted an update 6 months ago

Post

3488

IBM just released small swiss army knife for the document models: granite-docling-258M on Hugging Face 🔥

> not only a document converter but also can do document question answering, understand multiple languages 🤯
> best part: released with Apache 2.0 license 👏 use it with your commercial projects!
> it supports transformers, vLLM and MLX from the get-go! 🤗
> built on SigLIP2 & granite-165M

model: ibm-granite/granite-docling-258M
demo: ibm-granite/granite-docling-258m-demo 💗

merve

posted an update 6 months ago

Post

1251

a ton of image/video generation models and LLMs from big labs 🔥

> Meta released facebook/mobilellm-r1-68c4597b104fac45f28f448e, smol LLMs for on-device use 💬
> Tencent released tencent/SRPO, high res image generation model and tencent/POINTS-Reader, cutting edge OCR 📝
> ByteDance released bytedance-research/HuMo, video generation from any input ⏯️

find more models, datasets, demos here merve/sep-11-releases-68c7dbfa26bea8cd921fa0ac

AI & ML interests

Recent Activity

Team members 5

TopContributors-ProfileFollowers's activity