{"data":{"kind":"file","path":"README.md","version_id":"ws885jt55avlefga7orma647","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3713,"modified_at":"2025-10-20T00:56:41.989000","content_hash":"0edb40f331ff4f1f271af610f5c6e8db5a048e49f44d562fd7f671878d3aacfc"},"entries":[],"content":"# mastermind\n\n### Overview\n- **Environment ID**: `mastermind`\n- **Short description**: A code-breaking game where agents guess a secret numeric code using red/white feedback\n- **Tags**: reasoning, constraint-satisfaction, deduction, game, multi-turn\n\n### Datasets\n- **Primary dataset(s)**: Procedurally generated secret codes\n- **Source links**: Generated on-the-fly using `create_mastermind_dataset()`\n- **Split sizes**: Default 1000 train / 100 eval (configurable via `num_train_examples`, `num_eval_examples`)\n\n### Task\n- **Type**: multi-turn\n- **Parser**: `Parser` (basic parser, guesses are space-separated numbers)\n- **Rubric overview**: Binary success reward - 1.0 for win, 0.0 for loss (matches ICRL implementation)\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval mastermind\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval mastermind \\\n  -m gpt-4.1-mini \\\n  -n 20 -r 3 -t 1024 -T 0.7 \\\n  -a '{\"code_length\": 4, \"num_colors\": 6, \"max_turns\": 15}'\n```\n\nNotes:\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n- Each episode allows up to `max_turns` guesses (default 15)\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `num_train_examples` | int | `1000` | Number of training game instances |\n| `num_eval_examples` | int | `100` | Number of evaluation game instances |\n| `code_length` | int | `4` | Length of the secret code |\n| `num_colors` | int | `6` | Number of possible digits (1 to num_colors) |\n| `max_turns` | int | `15` | Maximum number of guesses allowed |\n| `seed` | int | `None` | Random seed for dataset generation |\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Binary success (1.0 if code guessed correctly, 0.0 otherwise) |\n| `reward_mean` | Success rate across episodes (matches ICRL's `success_rate`) |\n| `reward_std` | Standard deviation of success outcomes |\n\n### Additional State Tracking\n\nThe environment also tracks (accessible in state, not part of reward):\n- `attempts`: Number of guesses made\n- `won`: Boolean flag indicating if the code was guessed\n- `secret`: The secret code for the current episode\n\n### Game Rules\n\nThe agent plays Mastermind where:\n1. A secret code of `code_length` digits is randomly chosen (each digit 1 to `num_colors`)\n2. Numbers may repeat in the code\n3. The agent makes guesses in the format: `d1 d2 d3 d4` (e.g., \"1 3 4 2\")\n4. After each guess, feedback is provided:\n   - **Red**: Correct digit in correct position\n   - **White**: Correct digit in wrong position\n5. The agent wins by guessing the exact code\n6. Game ends when code is guessed or `max_turns` is reached\n\n### Feedback Examples\n\n**Example 1:**\n- Secret Code: `3 1 4 6`\n- Guess: `1 4 6 2`\n- Feedback: `0 red, 3 white`\n  - 1, 4, 6 are in the code but wrong positions\n\n**Example 2 (with repeats):**\n- Secret Code: `4 4 2 1`\n- Guess: `4 2 4 3`\n- Feedback: `1 red, 2 white`\n  - First 4 is correct position (red)\n  - Second 4 is wrong position (white)\n  - 2 is wrong position (white)\n\n### Reward Function\n\n**Binary Success**: Simple win/loss indicator\n- 1.0 if the agent guesses the correct code\n- 0.0 if the agent fails to guess within `max_turns` attempts\n- No partial credit or efficiency bonuses\n- Directly matches ICRL implementation\n- This allows computing success rate as the average reward across episodes\n\n### Strategy Notes\n\nEffective strategies include:\n- **Constraint satisfaction**: Eliminate possibilities based on feedback\n- **Information gathering**: Early guesses maximize information gain\n- **Deduction**: Use red/white feedback to narrow down possibilities\n- **Systematic search**: Explore the space methodically\n","encoding":"utf-8","truncated":false,"total_bytes":3713},"status":null}