{"data":{"kind":"file","path":"README.md","version_id":"dhj1yirnzauq4uba2poi9gib","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2911,"modified_at":"2025-11-28T02:39:13.758000","content_hash":"157933d47615a7dc34cddf4e5c6c1556407da68dab84ac63b61bae99591d14dc"},"entries":[],"content":"# frozen_lake\n\n### Overview\n- **Environment ID**: `frozen_lake`\n- **Short description**: Multi-turn Frozen Lake grid navigation game where an LLM agent must navigate from start to goal while avoiding holes on slippery ice.\n- **Tags**: games, multi-turn, navigation, reasoning, xml\n\n### Datasets\n- **Primary dataset(s)**: Self-generated episodes (no external dataset required)\n- **Source links**: Classic RL environment (OpenAI Gym / Gymnasium)\n- **Split sizes**: Number of episodes controlled via args\n\n### Task\n- **Type**: multi-turn (game interaction)\n- **Parser**: `XMLParser` with `action` field\n- **Rubric overview**: Goal completion reward, efficiency bonus, and format check\n\n### Game Description\n\nThe agent navigates a frozen lake grid:\n- **S**: Start position\n- **G**: Goal (reach this to win)\n- **H**: Hole (falling in ends the game)\n- **F**: Frozen ice (safe to walk on)\n- **A**: Agent's current position\n\nThe agent can move in 4 directions: `LEFT`, `DOWN`, `RIGHT`, `UP`.\n\n**Slippery Ice**: By default (`is_slippery=True`), the ice is slippery. When the agent chooses a direction, there's a 1/3 chance of moving in the intended direction and a 1/3 chance of slipping to each perpendicular direction.\n\n### Quickstart\n\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval frozen_lake\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval frozen_lake \\\n  -m gpt-4.1-mini \\\n  -n 20 -r 3 -t 1024 -T 0.7 \\\n  -a '{\"num_train_examples\": 1000, \"num_eval_examples\": 20, \"is_slippery\": true}'\n```\n\n### Environment Arguments\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `num_train_examples` | int | `1000` | Number of training episodes |\n| `num_eval_examples` | int | `20` | Number of evaluation episodes |\n| `map_name` | str | `\"4x4\"` | Map size (`\"4x4\"` or `\"8x8\"`) |\n| `is_slippery` | bool | `true` | Whether ice is slippery (stochastic transitions) |\n| `max_steps` | int | `50` | Maximum steps per episode |\n\n### Metrics\n| Metric | Meaning |\n| ------ | ------- |\n| `_goal_reward_func` | 1.0 if agent reaches the goal, else 0.0 |\n| `_efficiency_reward_func` | Bonus (up to 0.5) for reaching goal in fewer steps |\n| `format_reward` | Adherence to expected XML format (weight 0.1) |\n\n### Example Interaction\n\n**System prompt** instructs the agent on rules and format.\n\n**Initial state**:\n```\n=== Frozen Lake ===\nGrid (A=Agent, S=Start, G=Goal, H=Hole, F=Frozen):\nAFFF\nFHFH\nFFFH\nHFFG\n\nPosition: Row 0, Column 0\nSteps taken: 0\n===================\n```\n\n**Agent response**:\n```\nLooking at the grid, I'm at (0,0) which is the start. The goal is at (3,3).\nI need to navigate right and down while avoiding holes at (1,1), (1,3), (2,3), (3,0).\nThe safest first move is to go right.\n<action>RIGHT</action>\n```\n\n### Maps\n\n**4x4 Map (default)**:\n```\nSFFF\nFHFH\nFFFH\nHFFG\n```\n\n**8x8 Map**:\n```\nSFFFFFFF\nFFFFFFFF\nFFFHFFFF\nFFFFFHFF\nFFFHFFFF\nFHHFFFHF\nFHFFHFHF\nFFFHFFFG\n```\n\n","encoding":"utf-8","truncated":false,"total_bytes":2911},"status":null}