{"data":{"kind":"file","path":"README.md","version_id":"aqn0rwzoiis8qdt57fpsf40n","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":5188,"modified_at":"2025-12-03T22:46:05.112000","content_hash":"79b7d4601b93923564e9d186c8822f045022f709c765df5375f1f7f32dafc5de"},"entries":[],"content":"# pacman-array\n\n### Overview\n- **Environment ID**: `pacman-array`\n- **Short description**: Multi-turn Pacman game where an LLM agent navigates a 10x10 grid to eat all dots while avoiding ghosts. When powered up (after eating a power pellet), Pacman can eat ghosts for bonus points.\n- **Tags**: games, multi-turn, navigation, reasoning, xml\n\n### Datasets\n- **Primary dataset(s)**: Self-generated episodes (no external dataset required)\n- **Source links**: Classic arcade game (Pacman)\n- **Split sizes**: Number of episodes controlled via args\n\n### Task\n- **Type**: multi-turn (game interaction)\n- **Parser**: `XMLParser` with `action` field\n- **Rubric overview**: Win reward, score reward, survival reward, and format check\n\n### Game Description\n\nThe agent plays Pacman on a 10x10 grid:\n\n**Grid Elements:**\n- **E**: Empty space\n- **W**: Wall (cannot pass through)\n- **.**: Dot/Pellet (10 points, +0.1 reward)\n- **O**: Power Pellet (50 points, +0.5 reward, enables ghost eating)\n- **P**: Pacman (the agent)\n- **G**: Ghost (avoid unless powered up)\n\n**Actions (4 total):**\n- `0` (UP): Move up (row - 1)\n- `1` (DOWN): Move down (row + 1)\n- `2` (LEFT): Move left (col - 1)\n- `3` (RIGHT): Move right (col + 1)\n\n**Game Flow:**\n1. Pacman starts at a fixed position on the grid\n2. Ghosts move towards Pacman using Manhattan distance (they try to get closer each turn)\n3. Pacman must eat all dots (`.`) and power pellets (`O`) to win\n4. If Pacman touches a ghost while not powered up, the game ends (death)\n5. After eating a power pellet, Pacman is powered up for 15 steps and can eat ghosts for bonus points\n6. Game ends when all dots are eaten (win), Pacman dies (loss), or max steps reached\n\n**Rewards:**\n- Eat dot (`.`): **+0.1**\n- Eat power pellet (`O`): **+0.5**\n- Eat ghost (when powered up): **+2.0**\n- Win (all dots eaten): **+10.0**\n- Death (touched by ghost): **0.0** (game ends)\n- Each step: **0.0**\n\n**Grid Layout:**\nThe 10x10 grid has walls forming corridors with a 2x2 wall block in the center. The layout is fixed across episodes, with walls creating a maze-like structure that requires strategic navigation.\n\n### Quickstart\n\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval pacman-array\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval pacman-array \\\n  -m gpt-4.1-mini \\\n  -n 20 -r 3 -t 1024 -T 0.7 \\\n  -a '{\"num_train_examples\": 1000, \"num_eval_examples\": 20, \"max_steps\": 200}'\n```\n\n### Environment Arguments\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `num_train_examples` | int | `1000` | Number of training episodes |\n| `num_eval_examples` | int | `20` | Number of evaluation episodes |\n| `max_steps` | int | `200` | Maximum steps per episode |\n\n### Metrics\n| Metric | Meaning |\n| ------ | ------- |\n| `_win_reward_func` | 1.0 if agent wins (eats all dots), else 0.0 |\n| `_score_reward_func` | Normalized score reward (up to 1.0, typical winning score is ~2000-3000) |\n| `_survival_reward_func` | Progress reward based on dots eaten (normalized by total dots) |\n| `format_reward` | Adherence to expected XML format (weight 0.1) |\n\n### Example Interaction\n\n**System prompt** instructs the agent on rules, grid layout, actions, and format.\n\n**Initial state**:\n```\n2D Array Representation (10x10):\nLegend: E=Empty, W=Wall, .=Pellet, O=PowerPellet, P=Pacman, G=Ghost\n\nRow/Col:  0  1  2  3  4  5  6  7  8  9\n------------------------------------------\nRow  0:  .  .  .  .  .  .  .  .  .  .\nRow  1:  .  .  .  .  .  .  .  .  .  .\nRow  2:  .  .  W  W  .  .  W  W  .  .\nRow  3:  .  .  W  W  .  .  W  W  .  .\nRow  4:  .  .  .  .  .  .  .  .  .  .\nRow  5:  .  .  .  .  .  .  .  .  .  .\nRow  6:  .  .  W  W  .  .  W  W  .  .\nRow  7:  .  .  W  W  .  .  W  W  .  .\nRow  8:  .  .  .  .  .  .  .  .  .  .\nRow  9:  .  .  .  .  .  .  .  .  .  .\n\nScore: 0 | Reward: 0.00 | Steps: 0\nDots Remaining: 84\nPowered Up: False | Power Timer: 0\nPacman Coordinates (row, column): (9, 4)\nGhost Coordinates (row, column): (0, 4), (0, 5), (9, 5)\nPellet Coordinates (row, column): (0, 0), (0, 9), (9, 0), (9, 9), ...\n```\n\n**Agent response**:\n```\nI'm at position (9, 4) near the bottom center. There are ghosts at (0, 4), (0, 5), and (9, 5).\nThe ghost at (9, 5) is very close - I should move away from it.\nI'll move left to (9, 3) to put distance between me and the nearby ghost.\n<action>2</action>\n```\n\n**Environment response**:\n```\nAction: 2 (LEFT) - Reward: 0.10\n\n2D Array Representation (10x10):\n...\nScore: 10 | Reward: 0.10 | Steps: 1\nDots Remaining: 83\nPowered Up: False | Power Timer: 0\nPacman Coordinates (row, column): (9, 3)\nGhost Coordinates (row, column): (0, 3), (0, 4), (9, 4)\n...\n```\n\n### Strategy Tips\n\n1. **Avoid ghosts**: Keep distance from ghosts unless powered up. Ghosts move towards Pacman each turn.\n2. **Power pellets**: Use power pellets strategically - they enable ghost eating for 15 steps and provide bonus points.\n3. **Efficiency**: Plan routes to eat multiple dots in sequence while avoiding dead ends.\n4. **Walls**: The grid has walls forming corridors - plan paths that avoid getting trapped.\n5. **Ghost behavior**: Ghosts use Manhattan distance to move closer to Pacman - predict their movement to avoid collisions.\n","encoding":"utf-8","truncated":false,"total_bytes":5188},"status":null}