{"data":{"kind":"file","path":"README.md","version_id":"htnjfd6arbw2rmt1hfq9dwmq","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1262,"modified_at":"2025-08-28T01:07:32.343000","content_hash":"f70c7327408ec250058580d1f994d124978c228c72d11d7cc35d5252228d3a60"},"entries":[],"content":"# minesweeper-expert\n\n### Overview\n- **Environment ID**: `minesweeper-expert`\n- **Short description**: Expert Minesweeper (16×30, 99 mines) with first-click zero guarantee and two actions: clear or flag.\n- **Tags**: game, multi-turn, minesweeper, flags, reasoning, eval, train\n\n### Datasets\n- **Primary dataset(s)**: Synthetic episodes generated at runtime. Each sample encodes grid specs and seed in `question` and `info`.\n- **Source links**: n/a (no external dataset)\n- **Split sizes**: Controlled by env args; defaults `num_train_examples=2000`, `num_eval_examples=20`.\n\n### Task\n- **Type**: multi-turn\n- **Parser**: Custom minimal XML-style tags. Optional `<think>…</think>` plus exactly one action per turn:\n  - `<clear>row,col</clear>`\n  - `<flag>row,col</flag>`\n- **Rubric overview**:\n  - `win_reward`: 1.0 on win.\n  - `coverage_reward`: fraction of safe cells revealed.\n  - `efficiency_reward`: fewer turns to win yields higher score.\n  - `flag_quality_reward`: small bonus for correct flags, penalty for wrong flags.\n  - `format_reward`: exactly one valid action tag per turn.\n  - `invalid_penalty`: penalty for out-of-bounds, duplicate, or blocked actions.\n\n### Quickstart\nRun an evaluation with defaults:\n\n```bash\nuv run vf-eval minesweeper-expert\n","encoding":"utf-8","truncated":false,"total_bytes":1262},"status":null}