MedCalc-Bench Collection Evaluating Large Language Models for Medical Calculations • 4 items • Updated 6 days ago • 1
MedCalc-Bench Collection Evaluating Large Language Models for Medical Calculations • 4 items • Updated 6 days ago • 1
GeneGPT: Augmenting Large Language Models with Domain Tools for Improved Access to Biomedical Information Paper • 2304.09667 • Published Apr 19, 2023 • 1
RAG-Gym: Optimizing Reasoning and Search Agents with Process Supervision Paper • 2502.13957 • Published Feb 19 • 1
Benchmarking Retrieval-Augmented Generation for Chemistry Paper • 2505.07671 • Published May 12 • 1
Cell-o1: Training LLMs to Solve Single-Cell Reasoning Puzzles with Reinforcement Learning Paper • 2506.02911 • Published Jun 3
AgentMD: Empowering Language Agents for Risk Prediction with Large-Scale Clinical Tool Learning Paper • 2402.13225 • Published Feb 20, 2024 • 1
TrialPanorama: Database and Benchmark for Systematic Review and Design of Clinical Trials Paper • 2505.16097 • Published May 22
RAG-Gym: Optimizing Reasoning and Search Agents with Process Supervision Paper • 2502.13957 • Published Feb 19 • 1
Benchmarking Retrieval-Augmented Generation for Chemistry Paper • 2505.07671 • Published May 12 • 1