{"data":{"kind":"file","path":"README.md","version_id":"m2i3z1mofv1gp801dm3lea56","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":4250,"modified_at":"2025-11-29T01:55:27.557000","content_hash":"9a37bfa082b744d944ae0882dd3089931be8b6da3b7fbf94861a72a13a132fc5"},"entries":[],"content":"# taxi\n\n### Overview\n- **Environment ID**: `taxi`\n- **Short description**: Multi-turn Taxi navigation game where an LLM agent must navigate a 5x5 grid to pick up a passenger and deliver them to their destination while avoiding walls.\n- **Tags**: games, multi-turn, navigation, reasoning, xml, pickup-dropoff\n\n### Datasets\n- **Primary dataset(s)**: Self-generated episodes (no external dataset required)\n- **Source links**: Classic RL environment (OpenAI Gym / Gymnasium)\n- **Split sizes**: Number of episodes controlled via args\n\n### Task\n- **Type**: multi-turn (game interaction)\n- **Parser**: `XMLParser` with `action` field\n- **Rubric overview**: Completion reward, efficiency bonus, progress reward, and format check\n\n### Game Description\n\nThe agent controls a taxi navigating a 5x5 grid:\n\n```\n+---------+\n|R: | : :G|\n| : | : : |\n| : : : : |\n| | : | : |\n|Y| : |B: |\n+---------+\n```\n\n**Grid Symbols:**\n- **X**: Taxi (empty)\n- **T**: Taxi (carrying passenger)\n- **R**: Red location (top-left, row 0, col 0)\n- **G**: Green location (top-right, row 0, col 4)\n- **Y**: Yellow location (bottom-left, row 4, col 0)\n- **B**: Blue location (bottom-right, row 4, col 3)\n- **|**: Wall (cannot pass through)\n- **:**: Open path\n\n**Actions (6 total):**\n- `SOUTH`: Move one cell down\n- `NORTH`: Move one cell up\n- `EAST`: Move one cell right\n- `WEST`: Move one cell left\n- `PICKUP`: Pick up passenger (must be at passenger's location)\n- `DROPOFF`: Drop off passenger (must be carrying passenger and at destination)\n\n**Rewards:**\n- Each step: **-1** (encourages efficiency)\n- Successful dropoff: **+20** (game ends with success)\n- Illegal pickup/dropoff: **-10** (wrong location or no passenger)\n\n### Quickstart\n\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval taxi\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval taxi \\\n  -m gpt-4.1-mini \\\n  -n 20 -r 3 -t 1024 -T 0.7 \\\n  -a '{\"num_train_examples\": 1000, \"num_eval_examples\": 20, \"max_steps\": 100}'\n```\n\n### Environment Arguments\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `num_train_examples` | int | `1000` | Number of training episodes |\n| `num_eval_examples` | int | `20` | Number of evaluation episodes |\n| `max_steps` | int | `100` | Maximum steps per episode |\n\n### Metrics\n| Metric | Meaning |\n| ------ | ------- |\n| `_completion_reward_func` | 1.0 if passenger delivered, -0.5 if ran out of steps |\n| `_efficiency_reward_func` | Bonus (up to 0.5) for completing in fewer steps |\n| `_progress_reward_func` | 0.2 bonus for picking up passenger (even if not delivered) |\n| `format_reward` | Adherence to expected XML format (weight 0.1) |\n\n### Example Interaction\n\n**System prompt** instructs the agent on rules, grid layout, and format.\n\n**Initial state**:\n```\n=== Taxi ===\nGrid:\n+---------+\n|R: | : :G|\n| : | : : |\n| :X: : : |\n| | : | : |\n|Y| : |B: |\n+---------+\n\nLegend:\n- X: Taxi (empty)\n- T: Taxi (with passenger)\n- R, G, Y, B: Pickup/dropoff locations\n- | : Wall (cannot pass through)\n- : : Open path\n\nTaxi position: Row 2, Column 1\nPassenger: waiting at R\nDestination: B (Blue (bottom-right))\nSteps taken: 0\nTotal reward: 0.0\n=============\n```\n\n**Agent response**:\n```\nI'm at row 2, col 1. The passenger is at R (row 0, col 0).\nI need to go north and then west, but there's a wall between columns 0 and 1 at rows 0-1.\nI should first go to row 2 or below where I can cross to column 0.\nLet me go west first.\n<action>WEST</action>\n```\n\n### Wall Layout\n\nThe grid has walls that block movement between certain cells:\n- Wall between columns 0-1 at rows 0-1 (blocking access to R from the right at top)\n- Wall between columns 1-2 at rows 3-4 (blocking direct path in bottom-left area)\n- Wall between columns 3-4 at rows 3-4 (blocking access to B from the right)\n\nThe agent must plan routes that navigate around these walls.\n\n### Strategy Tips\n\n1. **Plan the route**: Consider walls when planning the path to the passenger and then to the destination.\n2. **Minimize steps**: Each step costs -1, so efficiency matters.\n3. **Avoid illegal actions**: Attempting PICKUP when not at passenger location or DROPOFF at wrong location costs -10.\n4. **Two-phase task**: First navigate to passenger → PICKUP → navigate to destination → DROPOFF.\n","encoding":"utf-8","truncated":false,"total_bytes":4250},"status":null}