{"data":{"kind":"file","path":"README.md","version_id":"sr3o2zh6vk4p5puxfeor0x2m","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2853,"modified_at":"2025-08-29T18:54:30.934000","content_hash":"3a818679c580617dd143bcdf74db77597e0c9121778a274bb34cfbefde769693"},"entries":[],"content":"# game-of-life\n\n### Overview\n- **Environment ID**: `game-of-life`\n- **Short description**: Conway's Game of Life\n- **Tags**: single-turn, game, grid, alife\n\n### Datasets\n- **Primary dataset(s)**: Generates data from GoL implementation\n- **Source links**: n/a\n- **Split sizes**: n/a\n\n### Task\n- **Type**: <single-turn>\n- **Parser**: custom\n- **Rubric overview**: Scores based on correctness with an increasing decay and validity of the returned grid (correct height, width and formatted within boxed{...})\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval game-of-life\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval game-of-life \\\n  -m gpt-4.1-mini \\\n  -n 20 -r 3 -t 1024 -T 0.7 \\\n  -a '{\"H\": 16, \"W\": 16, \"seed\": 1337, \"num_steps\": 1}'\n```\n\nNotes:\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n\n### Environment Arguments\nDocument any supported environment arguments and their meaning. Example:\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `system_prompt` | str | `DEFAULT_SYSTEM_PROMPT` | System prompt for model calls |\n| `H` | int | `16` | Height of the game grid |\n| `W` | int | `16` | Width of the game grid |\n| `num_steps` | int | `1` | How many steps ahead to predict |\n| `p_alive` | float | `0.1` | Probability of a cell to be alive in initial state |\n| `seed` | int | `None` | Seed for rng - if `None` random seed between 0 and 2**8 is used |\n| `correctness_decay` | int | `2` | Seed for rng - if `None` random seed between 0 and 2**8 is used |\n| `correctness_threshold` | float | `0.1` | If correctness is below threshold 0 is returned instead |\n| `weights` | dict | `{'correctness': 1.0, 'validity': 0.5}` | Weights for the reward components |\n\n```python\nDEFAULT_SYSTEM_PROMPT = \"\"\"\\\nYou are given a Conway's Game of Life grid of size H×W.\nThe current state is provided as H lines of W characters where:\n- '1' = live cell, '0' = dead cell.\nApply the standard rules with zero-padded (non-toroidal) boundaries:\n  1) A live cell with 2 or 3 live neighbors survives; otherwise it dies.\n  2) A dead cell with exactly 3 live neighbors becomes alive.\n\nThink step by step to determine the state NUM_STEPS step[s] in the future.\nDon't write a script or use other tools - the task is to generate the next state directly.\n\nReturn your final solution as H lines each with W characters wraped in a \\boxed{...} element.\nExample (3×3 just to illustrate the format):\n\\\\boxed{\n010\n111\n010\n}\n\"\"\"\n```\n\n### Metrics\nSummarize key metrics your rubric emits and how they’re interpreted.\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Main scalar reward (weighted sum of criteria) |\n| `correctness` | Scale of correct cells with exponential decay |\n| `validity` | Boolean value based on correct height, width and output formatting |\n\n","encoding":"utf-8","truncated":false,"total_bytes":2853},"status":null}