Music Flamingo: Scaling Music Understanding in Audio Language Models Paper • 2511.10289 • Published 24 days ago • 10
Audio Flamingo Sound-CoT Technical Report: Improving Chain-of-Thought Reasoning in Sound Understanding Paper • 2508.11818 • Published Aug 15
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Paper • 2510.15870 • Published Oct 17 • 89
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Paper • 2510.15870 • Published Oct 17 • 89
MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence Paper • 2508.13992 • Published Aug 19 • 7
Multi-Domain Audio Question Answering Toward Acoustic Content Reasoning in The DCASE 2025 Challenge Paper • 2505.07365 • Published May 12
Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models Paper • 2507.08128 • Published Jul 10 • 10
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark Paper • 2410.19168 • Published Oct 24, 2024 • 23
Do Audio-Language Models Understand Linguistic Variations? Paper • 2410.16505 • Published Oct 21, 2024 • 1
Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation Paper • 2410.13198 • Published Oct 17, 2024 • 10
ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds Paper • 2409.09213 • Published Sep 13, 2024 • 13
data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student training setup Paper • 2211.01246 • Published Nov 2, 2022
CoDa: Constrained Generation based Data Augmentation for Low-Resource NLP Paper • 2404.00415 • Published Mar 30, 2024
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities Paper • 2406.11768 • Published Jun 17, 2024 • 24
VDGD: Mitigating LVLM Hallucinations in Cognitive Prompts by Bridging the Visual Perception Gap Paper • 2405.15683 • Published May 24, 2024
CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised learning of speech representations Paper • 2210.02592 • Published Oct 5, 2022 • 2
CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models Paper • 2310.08753 • Published Oct 12, 2023
DALE: Generative Data Augmentation for Low-Resource Legal NLP Paper • 2310.15799 • Published Oct 24, 2023
UNFUSED: UNsupervised Finetuning Using SElf supervised Distillation Paper • 2303.05668 • Published Mar 10, 2023 • 1