OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Publication
NeurIPS 2024 Dataset and Benchmark Track