ToolRM: Outcome Reward Models for Tool-Calling Large Language Models Paper • 2509.11963 • Published Sep 15 • 2
NESTFUL: A Benchmark for Evaluating LLMs on Nested Sequences of API Calls Paper • 2409.03797 • Published Sep 4, 2024
EXPLORER: Exploration-guided Reasoning for Textual Reinforcement Learning Paper • 2403.10692 • Published Mar 15, 2024 • 1
Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks Paper • 2407.00121 • Published Jun 27, 2024
Formally Specifying the High-Level Behavior of LLM-Based Agents Paper • 2310.08535 • Published Oct 12, 2023
Granite Code Models: A Family of Open Foundation Models for Code Intelligence Paper • 2405.04324 • Published May 7, 2024 • 25
API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs Paper • 2402.15491 • Published Feb 23, 2024 • 16