Teaching VLMs to Localize Specific Objects from In-context ExamplesSivan DovehNimrod Shabtayet al.2025ICCV 2025Conference paper
Spoken question answering for visual queriesNimrod ShabtayZvi Konset al.2025INTERSPEECH 2025Conference paper
LiveXiv - A Multi-Modal live benchmark based on Arxiv papers contentNimrod ShabtayFelipe Maia Poloet al.2025ICLR 2025Conference paper