{"data":{"kind":"file","path":"README.md","version_id":"yd0y46ia0op2tui74s7vii1y","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2887,"modified_at":"2025-09-18T07:39:54.081000","content_hash":"290bfdfe5ec47d76e32637e8e7974e7b0b4c04b99f535319eef70412f93d13fd"},"entries":[],"content":"# lights-out\n\n### Overview\n- **Environment ID**: `lights-out`\n- **Short description**: Grid-based game where clicking toggles a light & its neighbors. Goal: all lights off.\n- **Tags**: eval, train, game, multi-turn, grid\n\n### Datasets\n- **Primary dataset(s)**: `scandukuri/lights-out-3x3`, a synthetic dataset of 3 x 3 Lights Out grids made from sampling random initial states and deterministically computing the canonical solution with `numpy`.\n- **Source links**: [Dataset](https://huggingface.co/datasets/scandukuri/lights-out-3x3), [proof of solvability](https://www.jstor.org/stable/2687202) for particular board dimensions\n- **Split sizes**: train split = `270`, test split = `30`\n\n### Task\n- **Type**: multi-turn\n- **Parser**: `XMLParser`\n- **Rubric overview**: The reward incorporates whether the model actually turns all lights off, whether the canonical solution was matched, how efficiently the board was solved, and whether responses were formatted correctly (`solved_reward`, `minimal_solution_reward`, `efficiency_reward`, `format_reward_func`)\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval lights-out\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval lights-out   -m gpt-4.1-mini   -n 20 -r 3 -t 1024 -T 0.7   -a '{\"key\": \"value\"}'  # env-specific args as JSON\n```\n\nNotes:\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n\n### Environment Arguments\nDocument any supported environment arguments and their meaning. Example:\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `max_turns` | int | `12` | Max moves the model may make before the game ends (ends early if solved). |\n| `use_think` | bool | `False` | Switches to a prompt that asks the model to think inside `<think>…</think>` before outputting `<step>row,col</step>`. |\n| `show_canonical` | bool | `False` | Includes the minimal-solution step count in the prompt (when available). |\n| `dataset_spec` | str | `scandukuri/lights-out-3x3` | HF dataset to load. Must supply square `initial_state` boards (0/1) and optionally `minimal_solution_steps`; size need not be 3×3. |\n\n\n### Metrics\nSummarize key metrics your rubric emits and how they’re interpreted.\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Main scalar reward (weighted sum of all criteria) |\n| `format_reward_func` | Checks whether the model produced a valid `<step>...</step>` output, inherited from `verifiers.parsers.xml_parser.XMLParser` (format compliance) |\n| `solved_reward` | 1.0 if the board is solved at the end of the rollout, else 0.0 |\n| `minimal_solution_reward` | 1.0 if solved using exactly the minimal number of moves, else 0.0 |\n| `efficiency_reward` | Higher if solved in fewer moves relative to the `max_turns` budget (computed as `max(0.0, 1.0 - (turns_taken / max_turns))`) |\n\n","encoding":"utf-8","truncated":false,"total_bytes":2887},"status":null}