-
Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey
Paper • 2505.15957 • Published • 3 -
Roadmap towards Superhuman Speech Understanding using Large Language Models
Paper • 2410.13268 • Published • 34 -
StressTest: Can YOUR Speech LM Handle the Stress?
Paper • 2505.22765 • Published • 17 -
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Paper • 2411.05361 • Published • 3
Collections
Discover the best community collections!
Collections including paper arxiv:2408.01337
-
SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation
Paper • 2405.18503 • Published • 9 -
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation
Paper • 2405.20289 • Published • 11 -
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Paper • 2406.02897 • Published • 16 -
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Paper • 2406.03344 • Published • 22
-
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Paper • 2309.04662 • Published • 24 -
Neurons in Large Language Models: Dead, N-gram, Positional
Paper • 2309.04827 • Published • 17 -
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Paper • 2309.05516 • Published • 10 -
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs
Paper • 2309.03907 • Published • 12
-
Futga: Towards Fine-grained Music Understanding through Temporally-enhanced Generative Augmentation
Paper • 2407.20445 • Published • 23 -
LP-MusicCaps: LLM-Based Pseudo Music Captioning
Paper • 2307.16372 • Published • 38 -
The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation
Paper • 2311.10057 • Published • 1 -
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models
Paper • 2408.01337 • Published • 12
-
A Novel 1D State Space for Efficient Music Rhythmic Analysis
Paper • 2111.00704 • Published -
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Paper • 2312.09911 • Published • 55 -
Music Style Transfer with Time-Varying Inversion of Diffusion Models
Paper • 2402.13763 • Published • 11 -
ChatMusician: Understanding and Generating Music Intrinsically with LLM
Paper • 2402.16153 • Published • 60
-
Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey
Paper • 2505.15957 • Published • 3 -
Roadmap towards Superhuman Speech Understanding using Large Language Models
Paper • 2410.13268 • Published • 34 -
StressTest: Can YOUR Speech LM Handle the Stress?
Paper • 2505.22765 • Published • 17 -
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Paper • 2411.05361 • Published • 3
-
Futga: Towards Fine-grained Music Understanding through Temporally-enhanced Generative Augmentation
Paper • 2407.20445 • Published • 23 -
LP-MusicCaps: LLM-Based Pseudo Music Captioning
Paper • 2307.16372 • Published • 38 -
The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation
Paper • 2311.10057 • Published • 1 -
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models
Paper • 2408.01337 • Published • 12
-
SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation
Paper • 2405.18503 • Published • 9 -
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation
Paper • 2405.20289 • Published • 11 -
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Paper • 2406.02897 • Published • 16 -
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Paper • 2406.03344 • Published • 22
-
A Novel 1D State Space for Efficient Music Rhythmic Analysis
Paper • 2111.00704 • Published -
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Paper • 2312.09911 • Published • 55 -
Music Style Transfer with Time-Varying Inversion of Diffusion Models
Paper • 2402.13763 • Published • 11 -
ChatMusician: Understanding and Generating Music Intrinsically with LLM
Paper • 2402.16153 • Published • 60
-
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Paper • 2309.04662 • Published • 24 -
Neurons in Large Language Models: Dead, N-gram, Positional
Paper • 2309.04827 • Published • 17 -
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Paper • 2309.05516 • Published • 10 -
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs
Paper • 2309.03907 • Published • 12