2 20 6

Xiaohan Xu

Tebmer

https://tebmer.github.io/

tebmer

AI & ML interests

Text-to-SQL

Recent Activity

upvoted a paper 15 days ago

Budget-Aware Tool-Use Enables Effective Agent Scaling

updated a dataset about 1 month ago

birdsql/mini-interact

updated a dataset about 1 month ago

birdsql/bird-interact-full

View all activity

Organizations

upvoted a paper 15 days ago

Budget-Aware Tool-Use Enables Effective Agent Scaling

Paper • 2511.17006 • Published Nov 21 • 29

updated 3 datasets about 1 month ago

published a dataset about 2 months ago

birdsql/mini-interact

Viewer • Updated Nov 19 • 300 • 263

upvoted 4 papers 3 months ago

PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning

Paper • 2510.13809 • Published Oct 15 • 37

MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use

Paper • 2509.24002 • Published Sep 28 • 174

WildChat: 1M ChatGPT Interaction Logs in the Wild

Paper • 2405.01470 • Published May 2, 2024 • 64

R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?

Paper • 2510.08189 • Published Oct 9 • 26

authored a paper 3 months ago

BIRD-INTERACT: Re-imagining Text-to-SQL Evaluation for Large Language Models via Lens of Dynamic Interactions

Paper • 2510.05318 • Published Oct 6 • 21

upvoted 2 papers 3 months ago

BIRD-INTERACT: Re-imagining Text-to-SQL Evaluation for Large Language Models via Lens of Dynamic Interactions

Paper • 2510.05318 • Published Oct 6 • 21

VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications

Paper • 2509.26490 • Published Sep 30 • 19

updated 2 datasets 3 months ago

birdsql/livesqlbench-base-full-v1

Viewer • Updated Sep 27 • 600 • 201 • 1

birdsql/livesqlbench-base-lite

Viewer • Updated Sep 27 • 270 • 108 • 2

upvoted a paper 4 months ago

CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning

Paper • 2508.20096 • Published Aug 27 • 36

published a dataset 4 months ago

birdsql/bird-interact-full

Viewer • Updated Nov 19 • 600 • 328 • 1

liked a dataset 5 months ago

birdsql/livesqlbench-base-lite-sqlite

Viewer • Updated Nov 11 • 270 • 381 • 2

upvoted a paper 6 months ago

Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

Paper • 2507.00432 • Published Jul 1 • 79

authored 2 papers 6 months ago

Leveraging Large Language Models for NLG Evaluation: A Survey

Paper • 2401.07103 • Published Jan 13, 2024 • 4

Re-Reading Improves Reasoning in Language Models

Paper • 2309.06275 • Published Sep 12, 2023 • 3

Xiaohan Xu

AI & ML interests

Recent Activity

Organizations

Tebmer's activity