{"data":{"kind":"file","path":"README.md","version_id":"uc6366gc7y8g7k7k2x2u7uc6","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2818,"modified_at":"2025-09-09T00:38:41.013000","content_hash":"d18c982b2e130a80aa4f997c9d4293b8dae7a4eef8e42edc465c6589e18c3663"},"entries":[],"content":"# rubiks\n\n### Overview\n- **Environment ID**: `rubiks`\n- **Short description**: Rubik's cube solving environment for training LLMs on spatial reasoning and algorithmic problem-solving\n- **Tags**: puzzle, spatial-reasoning, multi-step, algorithms\n\n### Datasets\n- **Primary dataset(s)**: Randomly generated cube scrambles of varying difficulty\n- **Source links**: Generated using standard cube scrambling algorithms\n- **Split sizes**: Configurable (default: 100 train / 20 eval)\n\n### Task\n- **Type**: multi-turn\n- **Parser**: XMLParser with think/moves fields\n- **Rubric overview**: \n  - Solving completion (1.0 weight)\n  - Efficiency based on move count (0.5 weight)\n  - Progress tracking for partial solutions (0.3 weight)\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval rubiks\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval rubiks \\\n  -m gpt-4.1-mini \\\n  -n 20 -r 3 -t 1024 -T 0.7 \\\n  -a '{\"difficulty\": \"easy\", \"use_strategy\": true}'\n```\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `num_train_examples` | int | `100` | Number of training scrambles to generate |\n| `num_eval_examples` | int | `20` | Number of evaluation scrambles to generate |\n| `use_strategy` | bool | `true` | Whether to suggest CFOP strategy in system |\n| `difficulty` | str | `\"easy\"` | Scramble difficulty: \"easy\" (5-10 moves), \"medium\" (15-20), \"hard\" (20-25) |\n| `max_turns` | int | `100` | Maximum number of solving attempts allowed |\n| `representation` | str | `emoji`| choice of how to represent the cube colors|\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Weighted sum of all reward components |\n| `check_solved_reward_func` | 1.0 if cube is solved, 0.0 otherwise |\n| `efficiency_reward_func` | Higher score for fewer moves (max 1.0 for ≤20 moves) |\n| `progress_reward_func` | Partial credit for reaching solving milestones (cross, F2L, etc.) |\n| `valid_moves_reward_func` | Percentage of turns with valid cube notation |\n| `format_reward_func` | XML format compliance score |\n\n### Cube Notation\nThe environment uses standard Rubik's cube notation:\n- **Face moves**: U (Up), D (Down), L (Left), R (Right), F (Front), B (Back)\n- **Modifiers**: ' (counter-clockwise), 2 (180 degrees)\n- **Examples**: U, U', U2, R U R', F2 L' D\n\nYou may also choose between emoji representation of square (⬜️🟨🟦🟥🟩🟧)\nor string (wybrgo) by passing in `representation`.\n\n### Future Work\nSome cool ideas to get to:\n- Breaking up the solve into separate goals to hit\n- Implementing ZZ, Roux, etc methods and seeing if any are more amenable to LLMS\n- Trying out different way to represent the cube. This could be a more intuitive layout or using tokens that are more represented in the literature.\n- VLLMS et al","encoding":"utf-8","truncated":false,"total_bytes":2818},"status":null}