{"data":{"kind":"file","path":"README.md","version_id":"udr9o1gpwmok7szf1rygpenj","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2882,"modified_at":"2025-09-22T06:29:06.785000","content_hash":"b49b615f18dbc700e87e703947729eddab0ffcaef32317c7e91e2c2d1f51ade8"},"entries":[],"content":"# arc-agi-3\n\n### Overview\n\n- **Environment ID**: `arc-agi-3`\n- **Short description**: ARC-AGI-3 sandbox that proxies the official competition API through the Verifiers agent loop.\n- **Tags**: ARC-AGI, tools, multi-turn\n\n### Datasets\n\n- **Primary dataset(s)**: Live ARC-AGI-3 games chosen via `game_id`.\n- **Source links**: [ARC Prize](https://three.arcprize.org/)\n- **Split sizes**: N/A (games are fetched on demand from the official API).\n\n### Task\n\n- **Type**: multi-turn tool use\n- **Parser**: JSON-only responses (custom extractor)\n- **Rubric overview**: single reward that awards `1.0` when the run finishes in `WIN`, `0.0` otherwise.\n\n### Quickstart\n\n1. Export your competition key:\n\n```bash\nexport ARC_API_KEY=\"sk-...\"\n```\n\n2. Run an evaluation with the default starter puzzle:\n\n```bash\nuv run vf-eval arc-agi-3 -n 1 -r 1\n```\n\n3. Target specific games and tweak limits:\n\n```bash\nuv run vf-eval arc-agi-3 \\\n  -a '{\"games\": [\"ls20\", \"meta\"], \"max_actions\": 60}' \\\n  -m gpt-4.1-mini -n 1 -r 1\n```\n\nNotes:\n\n- The environment streams real game states from the ARC servers; runs consume live attempts.\n- Actions must be emitted as JSON per the system prompt. Invalid JSON will be rejected with an error turn.\n- Scorecards are closed automatically after you emit the final summary JSON.\n- The environment requires the official [`arc-agi-3-agents`](https://github.com/arcprize/ARC-AGI-3-Agents) package for data models and scorecard validation.\n\n### Environment Arguments\n\n| Arg              | Type                      | Default                  | Description |\n| ---------------- | ------------------------- | ------------------------ | ----------- |\n| `games`          | list[str \\| dict]         | `[\"ls20\"]`              | Game IDs (or dicts with `game_id`, optional `prompt`, `tags`). |\n| `base_url`       | str                       | `https://three.arcprize.org`       | ARC API root (`http(s)://host[:port]`). |\n| `api_key`        | str                       | `ARC_API_KEY` env        | Competition API token. Required to issue actions. |\n| `max_actions`    | int                       | `80`                     | Maximum number of action turns before forcing a summary. |\n| `request_timeout`| float                     | `10.0`                   | HTTP timeout (seconds) for ARC API calls. |\n| `tags`           | list[str]                 | `[]`                     | Extra scorecard tags appended when opening runs. |\n\n### Metrics\n\n| Metric        | Meaning |\n| ------------- | ------- |\n| `reward`      | `1.0` if the final ARC state is `WIN`, else `0.0`. |\n| `final_score` | Latest score reported by the API (mirrors scorecard). |\n| `action_count`| Number of API actions submitted in the run. |\n\nFor more detail on the agent protocol and scorecard endpoints, see the official [ARC-AGI-3 Agents](https://github.com/arcprize/ARC-AGI-3-Agents) reference implementation.\n\n","encoding":"utf-8","truncated":false,"total_bytes":2882},"status":null}