{"data":{"kind":"file","path":"README.md","version_id":"szka8bmgv99tbh7z0g93jsa3","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":5566,"modified_at":"2025-12-04T08:03:00.147000","content_hash":"237de7aa2cc786111fe5db289e53ccfd82d66ef82a5e80c65774bfb8bd78a7f5"},"entries":[],"content":"# flappybird\n\n### Overview\n- **Environment ID**: `flappybird`\n- **Short description**: Multi-turn Flappy Bird control environment where an LLM decides TAP or NOOP each tick to navigate through pipes.\n- **Tags**: `rl`, `game`, `control`, `verifiers`, `real-time`\n\nA physics-based Flappy Bird simulation where the agent controls a bird navigating through gaps in scrolling pipes. The bird falls due to gravity and must tap to jump, avoiding collisions with pipes and boundaries. Features progressive difficulty ramping from easy initial gaps to tighter challenges.\n\n### Datasets\n- **Primary dataset**: Synthetic game states generated from fresh `FlappyGame` instances\n- **Source**: Custom implementation with configurable physics\n- **Split sizes**: Configurable via `num_examples` (default 10)\n\n### Task\n- **Type**: Multi-turn real-time control\n- **Parser**: `vf.XMLParser(fields=[\"think\", \"actions\"], answer_field=\"actions\")`\n- **Rubric overview**:\n  - `flappy_reward_func` (weight 2.0): Number of pipes successfully passed\n  - `flappy_survival_reward` (weight 0.2): Accumulated survival bonus with time-based multiplier\n  - Parser format reward (weight 0.2): Enforces `<THINK>`/`<ACTIONS>` XML structure\n\n### Action Format\n\nThe agent outputs one action per tick:\n\n```xml\n<THINK>Brief plan using physics projection</THINK>\n<ACTIONS>[TAP]</ACTIONS>\n```\n\nor for no-op:\n\n```xml\n<THINK>Safe position, let gravity adjust</THINK>\n<ACTIONS>[]</ACTIONS>\n```\n\n**Available actions**:\n- `[TAP]`: Set vertical velocity to jump impulse (+1.6)\n- `[]`: No action, gravity applies (-0.3 per tick)\n\n### Game Physics\n\n| Parameter | Default | Description |\n| --------- | ------- | ----------- |\n| `world_w` | 24.0 | World width |\n| `world_h` | 20.0 | World height (vertical bounds: ±10) |\n| `bird_x` | 4.0 | Fixed horizontal bird position |\n| `bird_radius` | 0.20 | Collision radius |\n| `gravity` | -0.30 | Downward acceleration per tick |\n| `jump_impulse` | 1.6 | Upward velocity on TAP |\n| `pipe_speed` | 0.30 | Leftward pipe movement per tick |\n| `easy_gap_height` | 6.0 | Initial gap height (easy mode) |\n| `base_gap_height` | 8.0 | Final gap height (after ramp) |\n| `easy_mode_pipes` | 6 | Pipes before difficulty ramp begins |\n| `ramp_pipes` | 12 | Pipes over which difficulty increases |\n\n### Observation Format\n\n```xml\n<FLAPPY id=K>\n<OBS>\nbirdY:By\nbirdX:Bx\nvelY:Vy\ngapHeight:G\nbirdRadius:R\npipes:[(x1,gapY1),(x2,gapY2),...]\nscore:Z\n</OBS>\n</FLAPPY>\n```\n\nWhere:\n- `By`: Bird's vertical position (range: [-10, 10])\n- `Bx`: Bird's horizontal position (fixed at 4.0)\n- `Vy`: Vertical velocity (positive = up, negative = down)\n- `G`: Current gap height\n- `R`: Bird collision radius\n- `pipes`: List of (x, gapY) for visible pipes\n- `Z`: Current score (pipes passed)\n\n### Quickstart\n\nRun with defaults:\n\n```bash\nuv run vf-eval flappybird\n```\n\nConfigure model and game parameters:\n\n```bash\nuv run vf-eval flappybird \\\n  -m gpt-4.1-mini \\\n  -n 20 -r 3 -t 512 -T 0.7 \\\n  -a '{\n    \"max_turns\": 500,\n    \"config_overrides\": {\n      \"easy_gap_height\": 7.0,\n      \"gravity\": -0.25\n    }\n  }'\n```\n\nNotes:\n- Use `-a/--env-args` for JSON kwargs forwarded to `load_environment()`\n- Reports are written to `./environments/flappybird/reports/`\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `max_turns` | int | `300` | Maximum ticks before episode ends |\n| `num_examples` | int | `10` | Number of game instances in dataset |\n| `config` | FlappyConfig \\| null | `null` | Full config object (overrides defaults) |\n| `config_overrides` | dict \\| null | `null` | Partial overrides merged with defaults |\n\n### Config Overrides\n\nAll `FlappyConfig` fields can be overridden:\n\n| Field | Type | Default | Description |\n| ----- | ---- | ------- | ----------- |\n| `world_w` | float | 24.0 | World width |\n| `world_h` | float | 20.0 | World height |\n| `bird_x` | float | 4.0 | Bird's fixed X position |\n| `bird_radius` | float | 0.20 | Bird collision radius |\n| `gravity` | float | -0.30 | Gravity acceleration |\n| `jump_impulse` | float | 1.6 | TAP velocity |\n| `pipe_speed` | float | 0.30 | Pipe scroll speed |\n| `pipe_spawn_interval` | int | 30 | Ticks between pipe spawns |\n| `first_spawn_interval` | int | 10 | Ticks before first pipe |\n| `base_gap_height` | float | 8.0 | Final gap height |\n| `easy_gap_height` | float | 6.0 | Initial gap height |\n| `easy_mode_pipes` | int | 6 | Pipes at easy difficulty |\n| `ramp_pipes` | int | 12 | Pipes during difficulty ramp |\n| `gap_center_offset` | float | 2.0 | Max gap center deviation from center |\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `pipes_passed` | Number of pipes successfully navigated |\n| `survival_score` | Accumulated survival bonus (higher for longer runs) |\n| `done` | Episode termination flag |\n| `reward` | Weighted combination: 2.0 × pipes + 0.2 × survival + 0.2 × format |\n\n### Survival Reward Details\n\nThe survival reward accumulates each tick the bird stays alive:\n- Base: 1.0 per tick\n- Multiplier: 1.0 + 0.005 × min(step, 80)\n- Encourages both survival and sustained play\n\n### Collision Detection\n\nThe bird is treated as a circle with center (Bx, By) and radius R:\n- **Boundary collision**: `By + R ≥ +10` or `By - R ≤ -10`\n- **Pipe collision**: When bird's X range overlaps pipe's X range AND bird's Y range exceeds the gap\n\n## Evaluation Reports\n\n<!-- Do not edit below this line. Content is auto-generated. -->\n<!-- vf:begin:reports -->\n<p>No reports found. Run <code>uv run vf-eval flappybird</code> to generate one.</p>\n<!-- vf:end:reports -->\n","encoding":"utf-8","truncated":false,"total_bytes":5566},"status":null}