Leshem Choshen

Publications

Position: Agentic Systems Should be General
- - Elron Bandel
  - Asaf Yehudai
  - et al.
- 2026
- ICML 2026
Conference paper
Stop Guessing When to Stop Testing: Efficient Model Evaluation with Just Enough Data
- - Ofir Arviv
  - Kristjan Greenewald
  - et al.
- 2026
- ACL 2026
Conference paper
Mediocricity is the key for LLM as a Judge Anchor Selection
- - Shachar Don-Yehiya
  - Asaf Yehudai
  - et al.
- 2026
- ACL 2026
Conference paper
Benchmark Agreement Testing Done Right: A Guide for LLM Benchmark Evaluation
- - Yotam Perlitz
  - Ariel Gera
  - et al.
- 2025
- NeurIPS 2025
Workshop paper
Elements of World Knowledge (EWoK): A Cognition-Inspired Framework for Evaluating Basic World Knowledge in Language Models
- - Anna A. Ivanova
  - Aalok Sathe
  - et al.
- 2025
- Transactions of the Association for Computational Linguistics
Paper
Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
- - Shivalika Singh
  - Angelika Romanou
  - et al.
- 2025
- ACL 2025
Conference paper
DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM Evaluation
- - Eliya Habba
  - Ofir Arviv
  - et al.
- 2025
- ACL 2025
Conference paper
The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community
- - Shachar Don-Yehiya
  - Leshem Choshen
  - et al.
- 2025
- ACL 2025
Demo paper
A Hitchhiker's Guide to Scaling Law Estimation
- - Leshem Choshen
  - Yang Zhang
  - et al.
- 2025
- ICML 2025
Conference paper
Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead
- - Rickard Gabrielsson
  - Jiacheng Zhu
  - et al.
- 2025
- ICML 2025
Conference paper

Visit Google Scholar

Top collaborators

Michal Shmueli-Scheuer

Distinguished Engineer, AI Benchmarking and Evaluation

Yotam Perlitz

Research Staff Member

Eyal Shnarch

Senior Research Scientist

Hadar Mulian

AI Research Scientist