{"data":{"kind":"file","path":"README.md","version_id":"bp0rqfxvkyv512hi21hij90y","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2509,"modified_at":"2025-09-20T17:44:14.251000","content_hash":"f28f9b0e6e4a6986e4af9022d29f4f4dd6adb77a4141b52f5eeef055c037cbf8"},"entries":[],"content":"# ascii-grid-maze\n\n### Overview\n- **Environment ID**: `ascii-grid-maze`\n- **Short description**: Multi-turn grid maze with optional moving blockers, damage tiles, a key, and an eventual goal point. Map is rendered using emojis. Designed to mimic some games shown on the ARC-AGI 3 Preview.\n- **Tags**: games, multi-turn, xml, feedback, maze\n\n### Datasets\n- **Primary dataset(s)**: Maze maps generated dynamically in script\n- **Source links**: N/A\n- **Split sizes**: Customizable\n\n### Task\n- **Type**: multi-turn (game interaction)\n- **Parser**: `XMLParser` with `think/move`.\n- **Rubric overview**: Formatting, penalty for more steps to incentivize shorter paths, turns-based reward for exploring new areas or reaching goal\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval ascii-grid-maze\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval ascii-grid-maze   -m gpt-4.1-mini \\\n -n 20 -r 3 -t 1024 -T 0.7   \\\n -a '{\"num_train_examples\": 2000, \"num_eval_examples\": 20, \"map_width\": 20, \"seed\": 0, \"descriptive\": false, \"damage_multiplier\": 5, \"death_penalty\": -1000.0}'  # env-specific args as JSON\n```\n\nNotes:\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n\n### Environment Arguments\nDocument any supported environment arguments and their meaning. Example:\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `num_train_examples` | int | `2000` | Number of training episodes |\n| `num_eval_examples` | int | `20` | Number of evaluation episodes |\n| `map_width` | int | `\"bar\"` | The size of the `n x n` grid, with `n` being `map_width` |\n| `seed` | int | `0` | Random number generator seed for generating maps |\n| `descriptive` | bool | `true` | Whether the environment will give context to values being presented. E.g. if `\"descriptive\": true`, `Health: 20` vs. just `20` if `false` |\n| `damage_multiplier` | int | `5` | How much damage is received for stepping on a damage tile |\n| `death_penalty` | float | `-1000.0` | The reward received for dying (running out of health) |\n\n### Metrics\nSummarize key metrics your rubric emits and how they’re interpreted.\n\n| Metric | Meaning |\n| ------ | ------- |\n| `total_reward_func` | +10.0 for reaching the goal, +0.0 for revisiting old positions, +1.0 for visiting new positions, `death_penalty` for running out of health |\n| `steps_metric` | Slight reward penalty for solutions that take more turns |\n| `format_reward` | Adherence to expected XML format |\n\n","encoding":"utf-8","truncated":false,"total_bytes":2509},"status":null}