tinyBenchmarks: evaluating LLMs with fewer examplesFelipe Maia PoloLucas Weberet al.2024ICML 2024Conference paper
Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AIElron BandelYotam Perlitzet al.2024NAACL 2024Demo paper
Navigating the Modern Evaluation Landscape: Considerations in Benchmarks and Frameworks for Large Language Models (LLMs)Leshem ChoshenAriel Geraet al.2024LREC-COLING 2024Tutorial
ACHIEVING HUMAN PARITY IN CONTENT-GROUNDED DATASETS GENERATIONAsaf YehudaiBoaz Carmeliet al.2024ICLR 2024Conference paper
tinyBenchmarks: evaluating LLMs with fewer examplesFelipe Maia PoloLucas Weberet al.2024ICLR 2024Workshop paper
Asymmetry in Low-Rank Adapters of Foundation ModelsJiacheng ZhuKristjan Greenewaldet al.2024ICLR 2024Workshop paper
TIES-Merging: Resolving Interference When Merging ModelsPrateek YadavDerek Tamet al.2023NeurIPS 2023Conference paper
Knowledge is a Region in Weight Space for Finetuned Language ModelsAlmog GuetaElad Venezianet al.2023EMNLP 2023Conference paper
Where to start? Analyzing the potential value of intermediate modelsLeshem ChoshenElad Venezianet al.2023EMNLP 2023Conference paper
DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question AnsweringElla NeemanRoee Aharoniet al.2023ACL 2023Conference paper