{"data":{"kind":"file","path":"README.md","version_id":"g0qjr2lev78jgsqegc2ajxgb","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2326,"modified_at":"2025-10-12T15:59:21.212000","content_hash":"ce343c039e88fb0ddc5a093a9e1fdc900581011d768a014597ed5b63a7eb52d5"},"entries":[],"content":"# mini-sudoku\n\n### Overview\n\n- **Environment ID**: `mini-sudoku`\n- **Short description**: Solve 4x4 sudoku puzzle; rewards correctness, (optional) partial correctness, and format.\n- **Tags**: puzzles, single-turn, sudoku, xml\n\n### Datasets\n\n- **Primary dataset(s)**: `metavind/mini-sudoku`\n- **Source links**: <https://huggingface.co/datasets/metavind/mini-sudoku>\n- **Split sizes**: 192 train examples, 96 test examples (per difficulty level)\n- **Difficulty levels**:\n  - `easy`: 1-4 empty cells\n  - `medium`: 5-8 empty cells\n  - `hard`: 9-12 empty cells\n- **Note**: The dataset is filtered by the `difficulty` parameter during environment initialization\n\n### Task\n\n- **Type**: single-turn\n- **Parser**: `XMLParser([\"answer\"])`\n- **Rubric overview**: Exact solution match, (optional) partial correctness credit, and format check.\n- **System prompt**: `Solve the following 4x4 sudoku puzzle by replacing all _ instances with the correct number such that each row, each column, and each of the four 2x2 blocks contains all numbers 1-4 exactly once. Provide your answer between <answer> and </answer> tags.`\n\n<table>\n  <tr>\n    <th>Input Format</th>\n    <th>Expected Answer</th>\n  </tr>\n  <tr>\n    <td><pre>3 _ 4 _\n4 _ 3 2\n_ _ 1 4\n_ _ _ 3</pre></td>\n    <td><pre>3 2 4 1\n4 1 3 2\n2 3 1 4\n1 4 2 3\n</pre></td>\n  </tr>\n</table>\n\n### Quickstart\n\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval mini-sudoku\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval mini-sudoku \\\n  -m gpt-4.1-mini \\\n  -n 20 -r 3 -t 2048 -T 0.7 \\\n  -a '{\"difficulty\": \"hard\"}'\n```\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `num_train_examples` | int | `-1` | Number of training examples to use (use -1 for all) |\n| `num_eval_examples` | int | `-1` | Number of evaluation examples to use (use -1 for all) |\n| `difficulty` | str | `\"medium\"` | Difficulty level to filter dataset by |\n| `include_partial_credit` | bool | `True` | Whether to award partial credit for correct cells |\n\n### Metrics\n\n| Metric | Weight | Range | Meaning |\n| ------ | ------ | ----- | ------- |\n| Format reward | 0.1 | 0.0-0.8 | Adherence to XML format |\n| Partial credit reward | 0.01 | 0.0-16.0 | Number of correct cells |\n| Correct answer reward | 1.0 | 0.0-1.0 | Solution matches exactly |\n","encoding":"utf-8","truncated":false,"total_bytes":2326},"status":null}