{"data":{"kind":"file","path":"README.md","version_id":"vomju7a55lim3eanbyckmnen","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":5097,"modified_at":"2026-05-08T05:31:45.360000","content_hash":"85e314fa107bbc674287d5d85657714b19fb65feb7039b485d56842e38fd463b"},"entries":[],"content":"# slitherlink-env\n\n### Overview\n- **Environment ID**: `slitherlink-env`\n- **Short description**: Multi-turn Slitherlink environment with exact rule-based verification\n- **Tags**: slitherlink, puzzle, reasoning, constraints, grid, multi-turn, train, eval\n\n### Datasets\n- **Primary dataset(s)**: Puzzle Loop 7x7 Hard Slitherlink boards stored as package data\n- **Source links**: Public puzzle-loop.com boards mirrored in `slitherlink_env/data`\n- **Default split sizes**: 15 train examples, 10 eval examples, 25 total boards\n- **Split policy**: train and eval clue grids are intentionally disjoint\n\n### Task\n- **Type**: multi-turn\n- **Output format**: exactly one strict `submit_board` tool call containing full `horizontal` and `vertical` edge grids\n- **Rubric overview**: solve reward plus dense structural scores for hidden-solution edge accuracy, visible clue satisfaction, vertex validity, connectivity, malformed submissions, and turn count\n\n### Quickstart\nInstall from the Prime Environments Hub:\n\n```bash\nprime env install savi/slitherlink-env\n```\n\nRun an evaluation with default settings:\n\n```bash\nprime eval run savi/slitherlink-env\n```\n\nChoose a provider and model explicitly:\n\n```bash\nprime eval run savi/slitherlink-env --provider prime -m gpt-5-nano\nprime eval run savi/slitherlink-env --provider openai -m gpt-4.1-mini\n```\n\nTo force direct OpenAI API billing with a local `OPENAI_API_KEY` instead of Prime Inference, pass the OpenAI base URL and key variable explicitly:\n\n```bash\nprime eval run savi/slitherlink-env \\\n  --api-base-url https://api.openai.com/v1 \\\n  --api-key-var OPENAI_API_KEY \\\n  --api-client-type openai_chat_completions \\\n  -m gpt-5.5 \\\n  -n 1 \\\n  -r 1 \\\n  -c 1 \\\n  -s \\\n  -o outputs/evals/gpt-5.5-openai-n1-r1 \\\n  -a '{\"max_turns\": 6}'\n```\n\nNotes:\n- The environment itself is provider-agnostic; it works with any model/provider supported by `prime eval run`.\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n- Eval logs include the board id, revision number, and board-state feedback to make stuck rollouts easier to diagnose.\n\nRun the 7x7 Hard board-state benchmark:\n\n```bash\nprime eval run savi/slitherlink-env \\\n  --hosted \\\n  -m openai/gpt-5.5 \\\n  -n 10 \\\n  -r 1 \\\n  -a '{\"dataset_variant\": \"7x7-hard\", \"max_turns\": 6}' \\\n  -S '{\"reasoning_effort\": \"medium\"}'\n```\n\n### Required Environment Variables\n- **For the environment package itself**: none\n- **For evaluation**: credentials depend on your chosen provider. `prime eval run` supports Prime Inference plus multiple OpenAI-compatible and non-OpenAI providers, including `openai`, `anthropic`, `openrouter`, `local`, and `vllm`.\n\n### Local Development\nFrom a source checkout, run the validation and test suite:\n\n```bash\nuv run python scripts/audit_board_data.py\nuv run --extra dev python -m pytest\nuv build\n```\n\nRepository-only board-maintenance helpers:\n\n```bash\nuv run python scripts/add_board.py --split eval --record-file /path/to/board.json\n```\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `max_train_examples` | int | `-1` | Limit the train split size (`-1` uses all shipped rows) |\n| `max_eval_examples` | int | `-1` | Limit the eval split size (`-1` uses all shipped rows) |\n| `max_turns` | int | `6` | Max evaluated full-board revisions per puzzle |\n| `dataset_variant` | string | `\"7x7-hard\"` | Use `\"7x7-hard\"`, `\"10x10-normal\"`, or `\"10x10-hard\"` for Puzzle Loop benchmarks, or `\"default\"` for the original 5x5 fixture set |\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `board_solved_score` | `1.0` when the submitted line edges solve the puzzle |\n| `edge_accuracy_score` | Dense hidden-solution edge accuracy used for training signal |\n| `clue_score` | Visible clue-count satisfaction score |\n| `vertex_score` | Fraction of touched vertices with valid degree 2 |\n| `connectivity_score` | Largest connected line component fraction |\n| `malformed_submission_count` | Count of invalid `submit_board` calls |\n| `turn_count_metric` | Number of evaluated board revisions |\n\n### Notes\n- The shipped JSON boards are exact-verification fixtures bundled with the package.\n- The `7x7-hard` variant contains 15 train and 10 eval boards sourced from public puzzle-loop 7x7 Hard puzzle pages, with source puzzle IDs and solution hashes retained in metadata.\n- The `10x10-normal` and `10x10-hard` variants each contain 10 train and 5 eval boards sourced from public puzzle-loop 10x10 pages.\n- Runtime only exposes visible clues; hidden solutions stay in `info` for tests and debugging.\n- The system prompt loads its packaged heuristic guide from `slitherlink_env/heuristics/slitherlink_techniques.md` so you can iterate on solver strategy without hard-coding every pattern into Python.\n- The add-board script, audit script, and difficulty-rubric design doc are repository-only development tools; they are intentionally not part of the installed hub package.\n- The environment exposes one strict tool, `submit_board`, so models can reason globally and revise a full candidate grid in fewer turns.\n","encoding":"utf-8","truncated":false,"total_bytes":5097},"status":null}