{"data":{"kind":"file","path":"README.md","version_id":"zeey57bql8v1k8ptnbagbt2i","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3017,"modified_at":"2025-09-10T03:18:18.976000","content_hash":"1dfa6aeedc0eaaa55d9f7904cb631549687c00e91519560144572aafc3a2e816"},"entries":[],"content":"# Battleship\r\n\r\n### Overview\r\n- **Environment ID**: `battleship`\r\n- **Short description**: Multi-turn environment where models play the classic game of Battleship by strategically guessing coordinates to locate and sink all enemy ships.\r\n- **Tags**: game, multi-turn, strategy\r\n- **Source Implementation**: Original implementation for RL training\r\n- **Socials**: [Github @ljt019](https://github.com/ljt019), [Hf @ljt019](https://huggingface.co/ljt019), [X @Ljt019117161](https://x.com/Ljt019117161)\r\n\r\n### Datasets\r\n- **Primary dataset(s)**: \r\n  - Generated gameplay scenarios (configurable number of games)\r\n\r\n### Task & Scoring\r\n- **Type**: multi-turn strategy game\r\n- **Parser**: XMLParser extracts coordinates from `<guess>COORDINATE</guess>` tags\r\n- **Rubric overview**: Weighted scoring based on victory, hits, strategic play, and board coverage\r\n\r\n**Game Mechanics:**\r\n\r\nModels receive:\r\n1. Current game state with XML tags showing results, remaining ships, hit/miss/sunk status\r\n2. Visual grid representation (10x10 board)\r\n3. Ship fleet: Carrier(5), Battleship(4), Cruiser(3), Submarine(3), Destroyer(2)\r\n\r\nModels must respond with: `<guess>COORDINATE</guess>` using format like `a1`, `e5`, `j10`.\r\n\r\n**Grid Symbols:**\r\n- `?` unknown/unguessed cell\r\n- `o` miss\r\n- `x` hit (ship not sunk)  \r\n- `s` sunk ship part\r\n\r\n**Expected Response Format:**\r\n```\r\nStrategic reasoning about next move\r\n<guess>e5</guess>\r\n```\r\n\r\nThe game continues until:\r\n- **Victory**: All ships are sunk\r\n- **Turn Limit**: Maximum turns reached\r\n\r\n### Quickstart\r\n\r\nRun an evaluation with default settings:\r\n```bash\r\nuv run vf-eval battleship\r\n```\r\n\r\nBrowse results\r\n```bash\r\nuv run vf-tui\r\n```\r\n\r\n## Environment Arguments\r\n\r\n| Arg           | Type         | Default           | Description                              |\r\n| ------------- | ------------ | ----------------- | ---------------------------------------- |\r\n| `max_turns`   | int          | `50`              | Maximum number of moves allowed          |\r\n| `num_games`   | int          | `1000`            | Number of games in dataset               |\r\n| `seed`        | int          | `5656`            | Seed for reproducible games      |\r\n\r\n---\r\n\r\n## Metrics\r\n\r\n| Metric                           | Weight | Meaning                                               |\r\n| -------------------------------- | ------ | ----------------------------------------------------- |\r\n| `reward`                         | -      | Final weighted rubric score (0.0 to 2.01)             |\r\n| `victory_reward`                 | 0.6    | Full reward (1.0) if all ships are sunk              |\r\n| `hit_reward`                     | 0.4    | Reward for each hit (0.03 per hit)          |\r\n| `strategic_hit_reward`           | 0.5    | Reward for adjacent hits to reward follow up   |\r\n| `coverage_efficiency_reward`     | 0.3    | Reward for good board coverage and exploration        |\r\n| `format_reward`                  | 0.2    | Reward for proper `<guess>COORDINATE</guess>` format  |\r\n\r\n---","encoding":"utf-8","truncated":false,"total_bytes":3017},"status":null}