Assessing Confidence in Large Language Models by Classifying Task Correctness using Similarity FeaturesDebarun BhattacharjyaBalaji Ganesanet al.2025ICLR 2025
SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model MergingAladin DjuheraSwanand Ravindra Kadheet al.2025ICLR 2025
Out-of-Distribution Detection using Synthetic Data GenerationMomin AbbasMuneeza Azmatet al.2025ICLR 2025
Retention Score: Quantifying Jailbreak Risks for Vision Language ModelsZaitang LiPin-Yu Chenet al.2025AAAI 2025
Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language ModelsXiaomeng XuPin-Yu Chenet al.2025AAAI 2025
PEEL the Layers and Find Yourself: Revisiting Inference-time Data Leakage for Residual Neural NetworksHuzaifa ArifKeerthiram Murugesanet al.2025IEEE SaTML 2025
VP-NTK: Exploring the Benefits of Visual Prompting in Differentially Private Data SynthesisChia-Yi HsuJia-You Chenet al.2025ICASSP 2025
Towards Unbiased Evaluation of Time-series Anomaly DetectorDebarpan BhattacharyaSumanta Mukherjeeet al.2025ICASSP 2025