{"data":{"kind":"file","path":"README.md","version_id":"uar3ko3ohgdau7zmkd2kq5bj","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1874,"modified_at":"2025-10-18T10:00:12.656000","content_hash":"81e5e55ab6ced42ca3887efad6d0f589e6f1b52723b95630ae31f60918ac10b0"},"entries":[],"content":"# chess-puzzles\n\n### Overview\n- **Environment ID**: `chess-puzzles`\n- **Short description**: Multi-turn chess puzzle environment based on Lichess puzzles\n- **Tags**: chess, games, multi-turn\n\n### Datasets\n- **Primary dataset(s)**: `Lichess/chess-puzzles`\n- **Source links**: [Dataset](https://huggingface.co/datasets/Lichess/chess-puzzles)\n\n### Task\n- **Type**: multi-turn\n- **Parser**: `UCIParser` - extracts UCI chess notation moves from model responses\n- **Rubric overview**: The reward incorporates whether the model makes correct moves (`correct_move_reward`), plays legal moves (`legal_move_reward`), and completes the puzzle (`completion_reward`) with weights [1.0, 0.5, 2.0]\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval chess-puzzles\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval chess-puzzles \\\n  -m gpt-5-nano \\\n  -n 4 -r 4 \\\n  -a '{\"min_rating\": 500, \"max_rating\": 800, \"themes\": [\"mateIn4\"]}'\n```\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `num_puzzles` | int | `100` | Number of puzzles to load from the dataset |\n| `seed` | int | `None` | Random seed for dataset shuffling (uses random if None) |\n| `min_rating` | int | `400` | Minimum puzzle rating filter |\n| `max_rating` | int | `600` | Maximum puzzle rating filter |\n| `themes` | List[str] | `[\"mateIn2\"]` | List of puzzle themes to filter by (e.g., \"mate\", \"fork\", \"endgame\") |\n| `show_legal_moves` | bool | `True` | Whether to include legal moves in the prompt |\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Main scalar reward (weighted sum of all criteria) |\n| `correct_move_reward` | Count of correct puzzle moves made by the model |\n| `legal_move_reward` | Ratio of legal moves to total expected moves |\n| `completion_reward` | 1.0 if puzzle is solved, else 0.0 |","encoding":"utf-8","truncated":false,"total_bytes":1874},"status":null}