OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Publication
arXiv