SemEval-2026 Task 8: MTRAGEval: Evaluating Multi-Turn RAG Conversations

Sara Rosenthal; Yannis Katsis; Vraj Shah; Marina Danilevsky

ACL 2026

Workshop

02 Jul 2026

SemEval-2026 Task 8: MTRAGEval: Evaluating Multi-Turn RAG Conversations

Abstract

We present the results and findings from SemEval Task 8: MTRAGEval. MTRAGEval measures three Retrieval Augmented Generation (RAG) subtasks: A. Retrieval, B. Generate, and C. Retrieve+Generate (full RAG) on multi- turn conversations. The task is evaluated using MTRAG-UN, a new benchmark for Multi-Turn RAG focusing on Unanswerable, Underspecified, Non-Standalone, and Unclear Questions. The MTRAGEval task attracted strong participation with 107 registered teams and 92 submissions across all tasks, and yielded several interesting findings on effective retrieval and query rewriting techniques, the use of ensemble models, and the compounding costs of retrieval errors on downstream generation quality.

Conference paper