arxiv:2509.05440

Direct-Scoring NLG Evaluators Can Use Pairwise Comparisons Too

Published on Sep 5

Authors:

Logan Lawrence ,

Abstract

A new direct-scoring method using synthetic summaries performs comparably to state-of-the-art pairwise evaluators in sample-level correlations for evaluating large-language models across different benchmarks.

AI-generated summary

As large-language models have been increasingly used as automatic raters for evaluating free-form content, including document summarization, dialog, and story generation, work has been dedicated to evaluating such models by measuring their correlations with human judgment. For sample-level performance, methods which operate by using pairwise comparisons between machine-generated text perform well but often lack the ability to assign absolute scores to individual summaries, an ability crucial for use cases that require thresholding. In this work, we propose a direct-scoring method which uses synthetic summaries to act as pairwise machine rankings at test time. We show that our method performs comparably to state-of-the-art pairwise evaluators in terms of axis-averaged sample-level correlations on the SummEval (+0.03), TopicalChat (-0.03), and HANNA (+0.05) meta-evaluation benchmarks, and release the synthetic in-context summaries as data to facilitate future work.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2509.05440 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2509.05440 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.05440 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.