{"data":{"kind":"file","path":"README.md","version_id":"nbuikmuvzxjb2ghai0gc0d4r","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2857,"modified_at":"2026-01-15T00:35:31.589000","content_hash":"94984031c0df151e4e579525bebabdb32144e7fd2a2ae915d28d464dfdd473dc"},"entries":[],"content":"# tablut\n\n### Overview\n- **Environment ID**: `tablut`\n- **Short description**: Ancient Viking board game where the LLM plays as defender (White) or attacker (Black) against a heuristic/random opponent\n- **Tags**: `game`, `strategy`, `board-game`, `multi-turn`\n\n### Game Rules\n\nTablut is a Viking-era asymmetric strategy game on a 9x9 board:\n\n- **White (Defenders)**: 1 King + 8 soldiers. Goal: escape the King to any edge escape tile\n- **Black (Attackers)**: 16 soldiers. Goal: capture the King by surrounding it\n- **Movement**: All pieces move like rooks (any distance horizontally/vertically, no jumping)\n- **Capture**: Sandwich enemy pieces between two of yours. King needs 4 attackers to capture (or 3 if adjacent to castle, or 2 if elsewhere)\n- **Special tiles**: Castle (center, King only), Camps (Black starting positions, no re-entry after leaving)\n\n### Datasets\n- **Primary dataset(s)**: Procedurally generated game states\n- **Source links**: Generated programmatically in environment loader\n- **Split sizes**: Configurable via `num_examples` (default: 100)\n\n### Task\n- **Type**: multi-turn\n- **Parser**: XMLParser with `<move>` field\n- **Rubric overview**: Win reward (1.0), format compliance (0.2), invalid move penalty, game length penalty\n\n### Quickstart\n\nRun an evaluation:\n\n```bash\nprime eval run tablut -m gpt-4o-mini -n 20\n```\n\nWith custom configuration:\n\n```bash\nprime eval run tablut \\\n  -m gpt-4o-mini \\\n  -n 20 -r 3 \\\n  -a '{\"llm_plays_as\": \"white\", \"max_game_moves\": 40}'\n```\n\n### Environment Arguments\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `num_examples` | int | `100` | Number of game instances in dataset |\n| `llm_plays_as` | str | `\"white\"` | Side LLM plays: `\"white\"`, `\"black\"`, or `\"random\"` |\n| `max_turns` | int | `30` | Maximum conversation turns |\n| `max_game_moves` | int | `40` | Maximum game moves before draw |\n| `min_random_prob` | float | `0.0` | Min opponent randomness (0=optimal, 1=random) |\n| `max_random_prob` | float | `1.0` | Max opponent randomness |\n\n### Metrics\n| Metric | Meaning |\n| ------ | ------- |\n| `win_reward_func` | 1.0 for win, 0.3 for draw, 0.0 for loss |\n| `format_reward_func` | Score for correct `<move>` XML formatting |\n| `invalid_move_penalty_func` | -0.1 per invalid move (capped at -0.3) |\n| `game_length_penalty_func` | -0.01 per move after 10 (capped at -0.4) |\n\n### How It Works\n\n1. **Environment Loader** (`tablut.py`):\n   - Generates game instances with configurable opponent difficulty\n   - Creates prompts with board state and move validation\n   - Supports White, Black, or random side assignment\n\n2. **Heuristic Opponent**:\n   - Minimax with alpha-beta pruning (depth 3)\n   - Configurable randomness for difficulty scaling\n\n3. **Move Format**:\n   ```\n   <think>Strategic reasoning...</think>\n   <move>(row,col) to (row,col)</move>\n   ```\n","encoding":"utf-8","truncated":false,"total_bytes":2857},"status":null}