ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web AgentsIdo LevyBen Wieselet al.2026ICLR 2026Conference paper
From Grounding to Planning: Benchmarking Bottlenecks in Web AgentsSegev ShlomovBen Wieselet al.2025ECAI 2025Conference paper
ST-WEBAGENTBENCH: A Benchmark for Evaluating Safety and Trustworthiness in Web AgentsIdo LevyBen Wieselet al.2025ICML 2025Workshop paper