AgentGym: Evolving Large Language Model-based Agents across Diverse Environments

Zhiheng Xi, Yiwen Ding, Wenxiang Chen, Boyang Hong, Honglin Guo, Junzhe Wang, Dingwen Yang, Chenyang Liao, Xin Guo, Wei He, Songyang Gao, Lu Chen, Rui Zheng, Yicheng Zou,
Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang,
Fudan NLP Lab & Fudan Vision and Learning Lab
Correspondence to: zhxi22@m.fudan.edu.cn , {tgui,qz}@fudan.edu.cn

Abstract

Building generalist agents that can handle diverse tasks and evolve themselves across different environments is a long-term goal in the AI community. Large language models (LLMs) are considered a promising foundation to build such agents due to their generalized capabilities. Current approaches either have LLM-based agents imitate expert-provided trajectories step-by-step, requiring human supervision, which is hard to scale and limits environmental exploration; or they let agents explore and learn in isolated environments, resulting in specialist agents with limited generalization. In this paper, we take the first step towards building generally-capable LLM-based agents with self-evolution ability. We identify a trinity of ingredients: 1) diverse environments for agent exploration and learning, 2) a trajectory set to equip agents with basic capabilities and prior knowledge, and 3) an effective and scalable evolution method. We propose AgentGym, a new framework featuring a variety of environments and tasks for broad, real-time, uni-format, and concurrent agent exploration. AgentGym also includes a database with expanded instructions, a benchmark suite, and high-quality trajectories across environments. Next, we propose a novel method, AgentEvol, to investigate the potential of agent self-evolution beyond previously seen data across tasks and environments. Experimental results show that the evolved agents can achieve results comparable to SOTA models. We release the AgentGym suite, including the platform, dataset, benchmark, checkpoints, and algorithm implementations.

MY ALT TEXT

An illustration of self-evolution for generally-capable LLM-based agents.

AgentGym Suite

MY ALT TEXT

Overview of the AgentGym framework.

AgentGym is a framework designed to help the community easily evaluate and develop generally-capable LLM-based agents. It features diverse interactive environments and tasks with a unified format, i.e., ReAct format. It supports real-time feedback and concurrency, and is easily scalable. It includes 14 environments across web navigating, text games, house-holding tasks, digital games, embodied tasks, tool-using and programming.

It also includes a high-quality trajectory set AgentTraj and a benchmark suite AgentEval.

We also propose a novel method, AgentEvol, to investigate the potential of agent self-evolution beyond previously seen data across tasks and environments. Experimental results show that the evolved agents can achieve results comparable to SOTA models.

Environment Traj Eval Original Repo EnvServer
WebShop 3930 200 WebShop-Repo agentenv-webshop
WebArena 0 20 WebArena agentenv-webarena
MAZE 215 25 MAZE-Repo agentenv-lmrlgym
Wordle 955 25 Wordle-Repo agentenv-lmrlgym
ALFWorld 2420 200 ALFWorld-Repo agentenv-alfworld
SciWorld 2120 200 SciWrold-Repo agentenv-sciworld
BabyAI 810 90 BabyAI-Repo agentenv-babyai
TextCraft 374 100 TextCraft-Repo agentenv-textcraft
Weather 311 20 Weather-Repo agentenv-tool
Movie 215 20 Movie-Repo agentenv-tool
Academia 0 20 Academia-Repo agentenv-tool
Sheet 0 20 Sheet-Repo agentenv-tool
TODOList 135 20 TODOList-Repo agentenv-tool
BIRD 3000 200 BIRD-Repo agentenv-sqlgym

Platform

The platform architecture of AgentGym is illustrated in the following figure. In AgentGym, different environments are deployed on different servers or ports and provide encapsulated HTTP services externally. This decouples the environments from other parts.

These services include APIs such as /createEnv to create an environment, /observation to get the current observation from the environment, /available_actions to get the currently available actions, /step to perform an action, and /reset to reset the environment.

We have implemented 14 types of environments, and developers can easily develop new environments and add them to AgentGym by encapsulating the aforementioned interfaces. EnvClients have the responsibility of receiving services provided by the server and encapsulating them into functions for user calls. AgentController is our core component that connects the agent and the environment. It is responsible for evaluating the agent, collecting data, and training the agent.

MY ALT TEXT

AgentGym Platform Architecture.

Main Results

MY ALT TEXT

BibTeX


        @misc{xi2024agentgym,
          title={AgentGym: Evolving Large Language Model-based Agents across Diverse Environments}, 
          author={Zhiheng Xi and Yiwen Ding and Wenxiang Chen and Boyang Hong and Honglin Guo and Junzhe Wang and Dingwen Yang and Chenyang Liao and Xin Guo and Wei He and Songyang Gao and Lu Chen and Rui Zheng and Yicheng Zou and Tao Gui and Qi Zhang and Xipeng Qiu and Xuanjing Huang and Zuxuan Wu and Yu-Gang Jiang},
          year={2024},
          eprint={2406.04151},
          archivePrefix={arXiv},
          primaryClass={cs.AI}
          }