SemEval-2026 Task 7: Everyday Knowledge Across Diverse Languages and Cultures

Nedjma Ousidhoum; Junho Myung; Carla Perez-almendros; Jiho Jin; Amr Keleg; Meriem Beloucif; Yi Zhou; Rodrigo Agerri; Vladimir Araujo; Naomi Baes; James Barry; Joanne Boisson; Nancy F. Chen; Christine De Kock; Aleksandra Edwards; Joseba Fernandez De Landa; Mohamed Fazli Imam; Huda Hakami; Shu-kai Hsieh; Joseph Marvin Imperial; Roy Ka-wei Lee; Chenyang Lyu; Younes Samih; Johan Sjons; Bryan Tan; Asahi Ushio; Weihua Zheng; Liu Zhengyuan; Alice Oh; Jose Camacho-collados

ACL 2026

Workshop paper

02 Jul 2026

SemEval-2026 Task 7: Everyday Knowledge Across Diverse Languages and Cultures

Abstract

We present our shared task on evaluating the adaptability of LLMs and NLP systems across multiple languages and cultures. The task data consist of an extended version of our manually constructed BLEND benchmark (Myung et al., 2024), covering more than 30 language–culture pairs, predominantly representing low-resource languages spoken across multiple continents. As the task is designed strictly for evaluation, participants were not permitted to use the data for training, fine-tuning, few-shot learning, or any other form of model adjustment. Participants were required to predict labels in two tracks: (a) Short-Answer Questions (SAQ) and (b) Multiple-Choice Questions (MCQ). They were allowed to submit any NLP sys- tem and adopt diverse modelling strategies, provided that the benchmark was used solely for evaluation. The task attracted more than 140 registered participants, and we received final submissions from 62 teams, along with 19 system description papers. We report the results and present an analysis of the best- performing systems and the most commonly adopted approaches. Furthermore, we discuss shared insights into open questions and challenges related to evaluation, misalignments, and methodological perspectives on model be- haviour in low-resource languages and for under-represented cultures.

Conference paper