{"data":{"kind":"file","path":"README.md","version_id":"wzez1eu0wsmjxk1usiniyld6","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":5359,"modified_at":"2026-03-23T00:19:30.780000","content_hash":"77fc780ba5a24ae55d810b2030ef6e7debe82d70b06ef1e337c1e5bfe2c9a150"},"entries":[],"content":"# arc-agi-3-env\n\n### Overview\n- **Environment ID**: `arc-agi-3-env`\n- **Short description**: Multi-turn ARC-style game environment for PrimeIntellect using `arcengine`, with support for `SimpleMaze`, `ComplexMaze`, and ARC-AGI tasks\n- **Tags**: `game`, `arc`, `arc-agi`, `maze`, `rl`, `multi-turn`\n\n### Datasets\n- **Primary dataset(s)**: Built-in `arcengine` maze levels plus ARC-AGI tasks fetched from the ARC API\n- **Source links**: `arcengine` package for local game definitions, `GET {ROOT_URL}/api/games` for ARC-AGI task discovery\n- **Split sizes**: Configurable at load time; maze levels are built in, ARC-AGI task count depends on fetched games and evaluation settings\n\n### Task\n- **Type**: multi-turn\n- **Output format expectations**: XML action tags. For movement: `<action>ACTION1</action>`. For click actions: `<action>ACTION6</action><x>12</x><y>34</y>`\n- **Rubric overview**: Binary task reward (`WIN` => 1, else 0), action-count tracking, and timeout tracking\n\n### Quickstart\n\nConfigure the repo-level `.env`:\n\n```bash\nARC_API_KEY=your_api_key\nROOT_URL=https://three.arcprize.org\nOPENAI_API_KEY=your_openai_key\n```\n\nRun a local smoke evaluation:\n\n```bash\nset -a\nsource .env\nset +a\nPYTHONPATH=environments/arc_agi_3_env uv run --project environments/arc_agi_3_env vf-eval arc-agi-3-env \\\n  -m gpt-4.1-mini \\\n  -b https://api.openai.com/v1 \\\n  -k OPENAI_API_KEY \\\n  -a '{\"game_family\":\"simple_maze\",\"level_index\":0,\"max_turns\":8}' \\\n  -n 1 -r 1\n```\n\nNotes:\n- Run these commands from the repo root.\n- The repo `.env` is loaded automatically by `arc_agi_3_env.py`, but `vf-eval` itself still needs the shell to see `OPENAI_API_KEY`, so source `.env` before running the command.\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n- `ARC_API_KEY` and `ROOT_URL` are only required for `game_family=\"arc_agi\"`.\n- `vf-eval` currently needs `PYTHONPATH=environments/arc_agi_3_env` when run manually from the repo root.\n\n### Environment Arguments\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `game_family` | str | `\"simple_maze\"` | Which game/task family to run: `simple_maze`, `complex_maze`, or `arc_agi` |\n| `max_turns` | int | `100` | Maximum number of turns before the rollout times out |\n| `system_prompt` | str | built-in | Override the default agent instructions |\n| `game_id` | str \\| null | `null` | Optional ARC-AGI game identifier to fetch from the API |\n| `level_index` | int | `-1` | Maze level index to evaluate for `simple_maze` or `complex_maze`; `-1` means all built-in levels |\n\n### Metrics\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Main scalar reward: `1.0` if terminal state is `WIN`, otherwise `0.0` |\n| `num_actions` | Number of actions taken by the agent, sourced from the engine action count |\n| `timed_out` | `1.0` if the rollout stopped because `max_turns` was reached, otherwise `0.0` |\n\n### Testing\n\nSmoke eval:\n\n```bash\nset -a\nsource .env\nset +a\nPYTHONPATH=environments/arc_agi_3_env uv run --project environments/arc_agi_3_env vf-eval arc-agi-3-env \\\n  -m gpt-4.1-mini \\\n  -b https://api.openai.com/v1 \\\n  -k OPENAI_API_KEY \\\n  -a '{\"game_family\":\"simple_maze\",\"level_index\":0,\"max_turns\":8}' \\\n  -n 1 -r 1\n```\n\nStructured recording:\n\n```bash\nuv run --project environments/arc_agi_3_env \\\n  python environments/arc_agi_3_env/evals/run_recording.py \\\n  --game-family simple_maze \\\n  --level-index 0 \\\n  --max-turns 8\n```\n\nThis writes an ARC-style `.recording.jsonl` under `environments/arc_agi_3_env/evals/recordings/`.\n\nSelect a specific ARC-AGI puzzle:\n\n```bash\nset -a\nsource .env\nset +a\n\nuv run --project environments/arc_agi_3_env \\\n  python environments/arc_agi_3_env/evals/run_recording.py \\\n  --game-family arc_agi \\\n  --game-id ft09-0d8bbf25 \\\n  --max-turns 3\n```\n\nDiscover available ARC-AGI puzzle ids:\n\n```bash\nset -a\nsource .env\nset +a\n\npython - <<'PY'\nimport os, requests\nr = requests.get(\n    f\"{os.environ['ROOT_URL'].rstrip('/')}/api/games\",\n    headers={\"X-API-Key\": os.environ[\"ARC_API_KEY\"], \"Accept\": \"application/json\"},\n    timeout=30,\n)\nr.raise_for_status()\nfor item in r.json():\n    print(item[\"game_id\"])\nPY\n```\n\n### How It Works\n\nThis environment adapts `arcengine` game loops and ARC-AGI task instances into a PrimeIntellect `MultiTurnEnv`:\n\n1. **Environment Loader** (`arc_agi_3_env.py`):\n   - Creates a PrimeIntellect-compatible multi-turn environment\n   - Selects a game family from env args\n   - Initializes built-in `arcengine` mazes or fetches ARC-AGI tasks from the ARC API\n\n2. **Game Adapter**:\n   - Instantiates a fresh game/task per rollout\n   - Serializes the current state into a text observation\n   - Parses model outputs into `arcengine` actions\n   - Advances the environment with `perform_action(...)`\n   - Supports `ACTION6` click actions via XML `<x>` / `<y>` tags\n\n3. **Termination and Metrics**:\n   - Marks success when the game reaches `GameState.WIN`\n   - Marks failure when the game reaches `GameState.GAME_OVER`\n   - Marks timeout when `max_turns` is reached\n   - Emits `reward`, `num_actions`, and `timed_out`\n\n### Relevant Links\n\n- [PrimeIntellect Verifiers Environments](https://docs.primeintellect.ai/verifiers/environments)\n- [ARCEngine Package](https://pypi.org/project/arcengine/)\n- [Evals README](/Users/ryanznie/Desktop/work/arc-agi-3-env/environments/arc_agi_3_env/evals/README.md)\n","encoding":"utf-8","truncated":false,"total_bytes":5359},"status":null}