Evaluating perturbation robustness of generative systems that use COBOL code inputsSamuel AckermanWesam Ibraheemet al.2026ICSE 2026Workshop paper
PACIFIC: a framework for generating benchmarks to check Precise Automatically Checked Instruction Following In CodeItay DreyfussAntonio Abu Nassaret al.2026ICSE 2026Workshop paper
Uncovering Code Insights: Leveraging GitHub Artifacts for Deeper Code UnderstandingZiv NevoOrna Razet al.2025ASE 2025Workshop paper
Vintage Code, Modern Judges: Meta-Validation in Low Data RegimesGal AmramOra Nova Fandinaet al.2025ASE 2025Workshop paper
Unveiling Safety Vulnerabilities of Large Language ModelsGeorge KourMarcel Zalmanoviciet al.2023EMNLP 2023Workshop paper
Predicting Question-Answering Performance of Large Language Models through Semantic ConsistencyElla RabinovichSamuel Ackermanet al.2023EMNLP 2023Workshop paper
Measuring the Measuring Tools: An Automatic Evaluation of Semantic Metrics for Text CorporaGeorge KourSamuel Ackermanet al.2022EMNLP 2022Workshop paper
Classifier Data Quality: A Geometric Complexity Based Method for Automated Baseline And Insights GenerationGeorge KourMarcel Zalmanoviciet al.2022AAAI 2022Workshop paper
Detecting model drift using polynomial relationsEliran RoffeSamuel Ackermanet al.2022AAAI 2022Workshop paper