{"data":{"kind":"file","path":"README.md","version_id":"hj8g2el2ihuq21orylzfruur","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2849,"modified_at":"2025-08-23T23:49:18.960000","content_hash":"56db8861533c3bec8bdb5ab28ce33c3285ed1631bf7178a7195fe1de112a05da"},"entries":[],"content":"# Logic Puzzle Solver Environment\n\n### Overview\n- **Environment ID**: `logic-puzzle-solver`\n- **Short description**: An environment for solving logic puzzles with deductive reasoning\n- **Tags**: logic, puzzles, reasoning, deduction, multi-turn\n\n### Datasets\n- **Primary dataset(s)**: Procedurally generated logic puzzles of varying difficulty\n- **Source**: Generated at runtime using the `PuzzleGenerator` class\n- **Split sizes**: Configurable via `num_puzzles` parameter (default: 100 puzzles)\n\n### Task\n- **Type**: Multi-turn conversational reasoning\n- **Parser**: XMLParser with tags for `reasoning`, `deductions`, `question`, and `answer`\n- **Rubric overview**: Correctness (50%), efficiency (20%), reasoning quality (20%), format compliance (10%)\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval logic-puzzle-solver\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval logic-puzzle-solver -m gpt-4.1-mini -n 20 -r 3 -t 1024 -T 0.7 -a '{\"num_puzzles\": 50, \"difficulty_distribution\": {\"easy\": 0.4, \"medium\": 0.4, \"hard\": 0.2}}'  \n```\n\nNotes:\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `num_puzzles` | int | `100` | Number of puzzles to generate |\n| `difficulty_distribution` | dict | `{\"easy\": 0.3, \"medium\": 0.4, \"hard\": 0.3}` | Distribution of puzzle difficulties |\n| `max_turns` | int | `10` | Maximum number of turns allowed per puzzle |\n| `min_turns` | int | `3` | Minimum number of turns before solution attempt is allowed |\n| `seed` | int | `42` | Random seed for reproducibility |\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Main scalar reward (weighted sum of criteria) |\n| `correctness_score` | Accuracy of the final solution (50% of reward) |\n| `efficiency_score` | Solving the puzzle in fewer turns (20% of reward) |\n| `reasoning_quality_score` | Quality and structure of reasoning (20% of reward) |\n| `format_compliance_score` | Following the required response format (10% of reward) |\n\n### Puzzle Structure\n\nEach puzzle consists of:\n- **Entities**: Individuals (e.g., Alice, Bob, Charlie)\n- **Attributes**: Categories with values (e.g., colors, pets, drinks)\n- **Clues**: Statements about relationships between entities and attributes\n\nThe task requires deducing which entity has which attribute values based on the given clues.\n\n### Clue Types\n\nThe environment generates various types of clues:\n1. **Direct clues**: \"Alice has the red color.\"\n2. **Negative clues**: \"Bob does not have the cat.\"\n3. **Relationship clues**: \"The person with the cat has the blue color.\"\n4. **Either/or clues**: \"Either Alice or Bob has the dog (but not both).\"\n5. **Combination clues**: \"Alice and Bob together have the cat and dog.\"\n\n","encoding":"utf-8","truncated":false,"total_bytes":2849},"status":null}