{"data":{"kind":"file","path":"README.md","version_id":"jxeuqkdcizp67o878zptxqsp","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2188,"modified_at":"2025-09-01T14:11:09.634000","content_hash":"1cb1ba3e6acc90e804e6762c2f1eaec50f9ed7782df6f3f6278dc17a99b3728d"},"entries":[],"content":"# battleship-micro\n\n### Overview\n- **Environment ID**: `battleship-micro`\n- **Short description**: Multi-turn Battleship on a 6×6 grid with one size‑3 and one size‑2 ship. The model fires using `<fire>row,col</fire>` and optionally thinks in `<think>…</think>`.\n- **Tags**: games, grid, multi-turn, eval, train\n\n### Datasets\n- **Primary dataset**: Synthetic boards generated on-the-fly with a per-row `board_seed` for determinism\n- **Split sizes**: train=200, eval=50 (configurable)\n\n### Task\n- **Type**: multi-turn chat\n- **Parser**: custom extractor for `<fire>r,c</fire>`\n- **Rubric overview**: weighted sum of: `success_reward`, `efficiency_reward`, `progress_reward`, `sunk_bonus_reward`, `format_reward`\n\n### Quickstart\nRun an evaluation with defaults:\n\n```bash\nuv run vf-eval battleship-micro\n```\n\nConfigure model, sampling, and env args:\n\n```bash\nuv run vf-eval battleship-micro \\\n  -m openai/gpt-5-chat \\\n  -n 20 -r 3 -t 512 -T 0.7 \\\n  -a '{\"use_think\": true, \"max_turns\": 25}'\n```\n\nNotes:\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `grid_size` | int | `6` | Board dimension (rows=cols=grid_size). Must be ≥4. |\n| `ship_lengths` | list[int] | `[3, 2]` | Fleet composition (cell lengths). |\n| `num_train_samples` | int | `200` | Number of training rows. |\n| `num_eval_samples` | int | `50` | Number of eval rows. |\n| `max_turns` | int | `25` | Assistant turns budget per example. |\n| `use_think` | bool | `true` | Allow `<think>…</think>` before firing. |\n| `seed` | int | `1337` | RNG seed for dataset generation. |\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Weighted sum of rubric criteria |\n| `success_reward` | 1.0 iff all ships sunk by end of episode |\n| `efficiency_reward` | `total_ship_cells / shots_used` when solved (capped at 1.0) |\n| `progress_reward` | `hits / total_ship_cells` |\n| `sunk_bonus_reward` | Fraction of ships sunk |\n| `format_reward` | Share of assistant turns with valid `<fire>r,c</fire>` |\n\n### Evaluation Reports\nReports will auto-render here after hub runs.\n\n","encoding":"utf-8","truncated":false,"total_bytes":2188},"status":null}