{"data":{"kind":"file","path":"README.md","version_id":"cvubsgciyulgpt376pkpa549","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":5271,"modified_at":"2026-03-17T16:40:41.311000","content_hash":"e2ee227a5a17346ea9727dbf07644318c1b23abf8dfdd888984d445628f1445d"},"entries":[],"content":"# gpt-world\n\nMulti-turn hex-grid pathfinding environment based on the GPT-World notebook.\n\nBoards are generated procedurally from the built-in templates each time the dataset is created.\n\n### Overview\n- **Environment ID**: `gpt-world`\n- **Short description**: Collect the key and then reach the goal on a hex-style board while avoiding walls.\n- **Tags**: `games`, `planning`, `tool-use`, `multiturn`, `pathfinding`\n\n### Task\n- **Type**: `tool use`\n- **Agent interface**: Call the `play_move` tool once per turn with one action: `UR`, `R`, `DR`, `DL`, `L`, `UL`, or `Pickup`.\n- **Board feedback**: After each turn, the environment sends an updated ASCII board so the agent can see the current state.\n- **Rubric overview**: Deterministic rewards based on valid play, key pickup, and final success.\n\n### Rules\n- You start at `@`.\n- You must collect `K` before `P` counts as a win.\n- `W` are blocked cells.\n- Invalid moves leave the board unchanged.\n- `Pickup` only works if the player is currently on the key cell.\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nprime eval run gpt-world\n```\n\nPick a subset of boards:\n\n```bash\nprime eval run gpt-world -a '{\"difficulty\":\"easy\",\"num_examples\":4}'\nprime eval run gpt-world -a '{\"difficulty\":\"hard\",\"num_examples\":4,\"max_turns\":96}'\n```\n\n### Copy-Paste Eval Command\n\n```bash\nset -a; source secrets.env >/dev/null 2>&1\n\nuv run prime eval run gpt-world \\\n  --model gpt-4o-mini \\\n  --api-base-url https://api.openai.com/v1 \\\n  --api-key-var OPENAI_API_KEY \\\n  --num-examples 25 \\\n  --rollouts-per-example 4 \\\n  --max-concurrent 4 \\\n  --sampling-args '{\"max_tokens\":16384,\"temperature\":0.7}' \\\n  --env-args '{\"difficulty\":\"all\",\"max_turns\":30}' \\\n  --state-columns termination_reason,turn_count,picked_key,invalid_actions,total_reward \\\n  --save-results \\\n  --tui\n```\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `difficulty` | `str` | `\"all\"` | One of `all`, `easy`, `medium`, `hard`. |\n| `seed` | `int` | `0` | Seed used to generate board layouts deterministically. |\n| `num_examples` | `int` | `4096` | Number of generated examples to build inside the environment when not overridden in `env_args`. This is intentionally large so eval-time `--num-examples` can draw fresh boards instead of being capped by a tiny internal dataset. |\n| `eval_examples` | `int \\| null` | `null` | Number of eval examples to prebuild for the eval split; defaults to `num_examples`. |\n| `max_turns` | `int` | `128` | Maximum number of model turns before truncation. |\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Sum of deterministic step rewards |\n| `solved` | `1.0` iff the key was collected and the goal was reached |\n| `picked_key` | `1.0` iff the key was collected |\n| `invalid_actions` | Count of invalid turns |\n| `turn_count` | Number of turns taken |\n| `optimality_gap` | Extra turns above the shortest valid solution, or `-1` if unsolved |\n\n### Local Helpers\n\n- [`render_board`](/Users/alexandremaraval/Documents/Projects/prime-envs/environments/gpt-world/gpt_world_core.py) renders the board seen by the agent.\n- [`shortest_solution`](/Users/alexandremaraval/Documents/Projects/prime-envs/environments/gpt-world/gpt_world_core.py) computes the optimal action sequence for a board.\n\n### Credits\n\nThis environment is inspired by the original GPT-World project by Sasha Rush and collaborators:\n\n- [GPTWorld](https://github.com/srush/GPTworld)\n- [Original GPT-World notebook](https://colab.research.google.com/github/srush/GPTWorld-Challenge/blob/main/GPT4_game.ipynb)\n\nThis Prime environment adapts that idea into a deterministic multi-turn tool-use benchmark with per-turn board visualization.\n- [`generate_boards.py`](/Users/alexandremaraval/Documents/Projects/prime-envs/environments/gpt-world/generate_boards.py) procedurally generates solvable boards by re-sampling start, key, goal, and walls from the predefined templates.\n\n### Procedural Board Generation\n\nThe environment now uses procedural generation by default. The helper script is still available if you want to preview the kinds of boards that will be produced while keeping the original board sizes and wall counts:\n\n```bash\nuv run python generate_boards.py --count 8 --seed 7\nuv run python generate_boards.py --count 4 --difficulty hard\n```\n\n### RL Training Configs\n\nTwo starter RL configs live under [`configs/rl`](/Users/alexandremaraval/Documents/Projects/prime-envs/environments/gpt-world/configs/rl):\n\n- [`gpt-world-local-smoke.toml`](/Users/alexandremaraval/Documents/Projects/prime-envs/environments/gpt-world/configs/rl/gpt-world-local-smoke.toml) is a conservative self-managed `prime-rl` smoke run against the local `gpt-world` package.\n- [`gpt-world-qwen3-4b-instruct.toml`](/Users/alexandremaraval/Documents/Projects/prime-envs/environments/gpt-world/configs/rl/gpt-world-qwen3-4b-instruct.toml) is a longer hosted-ready config. Replace the placeholder Hub id after pushing the environment.\n\nLocal smoke run:\n\n```bash\nprime env install gpt-world\nuv run prime-rl configs/rl/gpt-world-local-smoke.toml\n```\n\nHosted-ready flow:\n\n```bash\nprime env push gpt-world --visibility PRIVATE\n# then update configs/rl/gpt-world-qwen3-4b-instruct.toml with your Hub id\n```\n","encoding":"utf-8","truncated":false,"total_bytes":5271},"status":null}