Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2408.01337

Evaluations of Large Audio-Language Models (LALMs)

This collection contains papers for various LALM evaluation frameworks.

Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey

Paper • 2505.15957 • Published May 21 • 3
Roadmap towards Superhuman Speech Understanding using Large Language Models

Paper • 2410.13268 • Published Oct 17, 2024 • 34
StressTest: Can YOUR Speech LM Handle the Stress?

Paper • 2505.22765 • Published May 28 • 17
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks

Paper • 2411.05361 • Published Nov 8, 2024 • 3

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Paper • 2405.18503 • Published May 28, 2024 • 9
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation

Paper • 2405.20289 • Published May 30, 2024 • 11
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes

Paper • 2406.02897 • Published Jun 5, 2024 • 16
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Paper • 2406.03344 • Published Jun 5, 2024 • 22

MADLAD-400: A Multilingual And Document-Level Large Audited Dataset

Paper • 2309.04662 • Published Sep 9, 2023 • 24
Neurons in Large Language Models: Dead, N-gram, Positional

Paper • 2309.04827 • Published Sep 9, 2023 • 17
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Paper • 2309.05516 • Published Sep 11, 2023 • 10
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs

Paper • 2309.03907 • Published May 18, 2023 • 12

Futga: Towards Fine-grained Music Understanding through Temporally-enhanced Generative Augmentation

Paper • 2407.20445 • Published Jul 29, 2024 • 23
LP-MusicCaps: LLM-Based Pseudo Music Captioning

Paper • 2307.16372 • Published Jul 31, 2023 • 38
The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation

Paper • 2311.10057 • Published Nov 16, 2023 • 1
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models

Paper • 2408.01337 • Published Aug 2, 2024 • 12

A Novel 1D State Space for Efficient Music Rhythmic Analysis

Paper • 2111.00704 • Published Nov 1, 2021
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

Paper • 2312.09911 • Published Dec 15, 2023 • 55
Music Style Transfer with Time-Varying Inversion of Diffusion Models

Paper • 2402.13763 • Published Feb 21, 2024 • 11
ChatMusician: Understanding and Generating Music Intrinsically with LLM

Paper • 2402.16153 • Published Feb 25, 2024 • 60

Evaluations of Large Audio-Language Models (LALMs)

This collection contains papers for various LALM evaluation frameworks.

Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey

Paper • 2505.15957 • Published May 21 • 3
Roadmap towards Superhuman Speech Understanding using Large Language Models

Paper • 2410.13268 • Published Oct 17, 2024 • 34
StressTest: Can YOUR Speech LM Handle the Stress?

Paper • 2505.22765 • Published May 28 • 17
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks

Paper • 2411.05361 • Published Nov 8, 2024 • 3

Futga: Towards Fine-grained Music Understanding through Temporally-enhanced Generative Augmentation

Paper • 2407.20445 • Published Jul 29, 2024 • 23
LP-MusicCaps: LLM-Based Pseudo Music Captioning

Paper • 2307.16372 • Published Jul 31, 2023 • 38
The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation

Paper • 2311.10057 • Published Nov 16, 2023 • 1
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models

Paper • 2408.01337 • Published Aug 2, 2024 • 12

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Paper • 2405.18503 • Published May 28, 2024 • 9
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation

Paper • 2405.20289 • Published May 30, 2024 • 11
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes

Paper • 2406.02897 • Published Jun 5, 2024 • 16
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Paper • 2406.03344 • Published Jun 5, 2024 • 22

A Novel 1D State Space for Efficient Music Rhythmic Analysis

Paper • 2111.00704 • Published Nov 1, 2021
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

Paper • 2312.09911 • Published Dec 15, 2023 • 55
Music Style Transfer with Time-Varying Inversion of Diffusion Models

Paper • 2402.13763 • Published Feb 21, 2024 • 11
ChatMusician: Understanding and Generating Music Intrinsically with LLM

Paper • 2402.16153 • Published Feb 25, 2024 • 60

MADLAD-400: A Multilingual And Document-Level Large Audited Dataset

Paper • 2309.04662 • Published Sep 9, 2023 • 24
Neurons in Large Language Models: Dead, N-gram, Positional

Paper • 2309.04827 • Published Sep 9, 2023 • 17
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Paper • 2309.05516 • Published Sep 11, 2023 • 10
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs

Paper • 2309.03907 • Published May 18, 2023 • 12

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs