Yale NLP Lab

university

https://nlp.cs.yale.edu/

yalenlp

yale-nlp

Activity Feed Request to join this org

AI & ML interests

Natural Language Processing at Yale

Recent Activity

yilunzhao authored a paper 2 days ago

AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research

yilunzhao authored a paper 2 days ago

PuzzlePlex: Benchmarking Foundation Models on Reasoning and Planning with Puzzles

yilunzhao authored a paper 2 days ago

MSRS: Evaluating Multi-Source Retrieval-Augmented Generation

View all activity

Papers

Can Multimodal Foundation Models Understand Schematic Diagrams? An Empirical Study on Information-Seeking QA over Scientific Papers

View all Papers

yilunzhao

authored 14 papers 2 days ago

AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research

Paper • 2507.13300 • Published Jul 17, 2025 • 20

PuzzlePlex: Benchmarking Foundation Models on Reasoning and Planning with Puzzles

Paper • 2510.06475 • Published Oct 7, 2025 • 2

MSRS: Evaluating Multi-Source Retrieval-Augmented Generation

Paper • 2508.20867 • Published Aug 28, 2025

FinLFQA: Evaluating Attributed Text Generation of LLMs in Financial Long-Form Question Answering

Paper • 2510.06426 • Published Oct 7, 2025 • 3

SUCEA: Reasoning-Intensive Retrieval for Adversarial Fact-checking through Claim Decomposition and Editing

Paper • 2506.04583 • Published Jun 5, 2025

FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents

Paper • 2411.05764 • Published Nov 8, 2024

MRMR: A Realistic and Expert-Level Multidisciplinary Benchmark for Reasoning-Intensive Multimodal Retrieval

Paper • 2510.09510 • Published Oct 10, 2025 • 8

FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain

Paper • 2510.15232 • Published Oct 17, 2025 • 6

LimRank: Less is More for Reasoning-Intensive Information Reranking

Paper • 2510.23544 • Published Oct 27, 2025 • 9

Measuring what Matters: Construct Validity in Large Language Model Benchmarks

Paper • 2511.04703 • Published Nov 3, 2025 • 8

AlphaResearch: Accelerating New Algorithm Discovery with Language Models

Paper • 2511.08522 • Published Nov 11, 2025 • 18

Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs

Paper • 2601.08763 • Published 26 days ago • 146

Rethinking Composed Image Retrieval Evaluation: A Fine-Grained Benchmark from Image Editing

Paper • 2601.16125 • Published 17 days ago • 13

SAGE: Benchmarking and Improving Retrieval for Deep Research Agents

Paper • 2602.05975 • Published 3 days ago • 11

mikeweii

updated a collection 5 days ago

Anchor

Dataset and models for paper: ANCHOR: Branch-Point Data Generation for GUI Agents • 7 items • Updated 5 days ago

mikeweii

updated a model 5 days ago

yale-nlp/Qwen3-VL-8B-Anchor-Windows

770k • Updated 5 days ago • 11

mikeweii

published a model 5 days ago

yale-nlp/Qwen3-VL-8B-Anchor-Windows

770k • Updated 5 days ago • 11

mikeweii

updated a model 5 days ago

yale-nlp/Qwen2.5-VL-7B-Anchor-Windows

849k • Updated 5 days ago • 10

mikeweii

updated a collection 5 days ago

Anchor

Dataset and models for paper: ANCHOR: Branch-Point Data Generation for GUI Agents • 7 items • Updated 5 days ago

mikeweii

published a model 5 days ago

yale-nlp/Qwen2.5-VL-7B-Anchor-Windows

849k • Updated 5 days ago • 10