{"data":{"kind":"file","path":"README.md","version_id":"tbir9f1btmb5kgzcea0buxqz","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3024,"modified_at":"2025-09-23T23:03:17.667000","content_hash":"0cc0949fffb907323e6f9446775c60a159b95cbad6b7902d94846398b87312f2"},"entries":[],"content":"# videopoker\n\n### Overview\n- **Environment ID**: `videopoker`\n- **Short description**: Single-turn Jacks or Better video poker environment that scores actions by their exact expected payout.\n- **Tags**: games, single-turn, rl\n\n### Datasets\n- **Primary dataset(s)**: Synthetic Jacks or Better hands sampled from a standard 52-card deck when the environment loads.\n- **Source links**: N/A (data is generated on the fly).\n- **Split sizes**: `num_hands` prompts are generated for the train split (defaults to 200). Provide a custom evaluation dataset via `eval_dataset` if desired.\n\n### Task\n- **Type**: single-turn\n- **Parser**: Default verifiers parser (no custom parsing required).\n- **Rubric overview**: One reward function, `video_poker_reward`, enumerates every possible redraw outcome and returns the exact expected payout implied by the model's HOLD decision.\n\n### Action format\nEach prompt shows a five-card hand with zero-based indices and the paytable that defines the payouts. The model must reply in the format `HOLD: i j k`, listing the indices of the cards to keep in ascending order (omit indices to discard all cards). The environment discards every non-held card, evaluates every possible redraw from the remaining deck, and uses the average payout as the reward.\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval videopoker\n```\n\nConfigure model, sampling, and environment parameters:\n\n```bash\nuv run vf-eval videopoker \\\n  -m gpt-4.1-nano \\\n  -n 5 -r 1 -t 1024 -T 0.7 \\\n  -a '{\"num_hands\": 50, \"seed\": 42}'\n```\n\nNotes:\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n- Provide your own `dataset` or `eval_dataset` via `--env-args` if you want to evaluate on a fixed set of hands.\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `num_hands` | int | `200` | Number of synthetic prompts to generate when no dataset is supplied. |\n| `seed` | int or null | `null` | Seed for reproducible hand generation. |\n| `paytable` | mapping | Jacks or Better defaults | Optional mapping from hand categories (e.g., `\"royal_flush\"`) to payout values. |\n| `dataset` | `datasets.Dataset` or null | `null` | Pre-built dataset to use for training/evaluation instead of synthetic generation. |\n| `eval_dataset` | `datasets.Dataset` or null | `null` | Optional evaluation split. |\n| `rubric` | `verifiers.rubrics.Rubric` or null | `null` | Custom rubric; defaults to the built-in expected-value scorer. |\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Main scalar reward equal to the expected payout of the HOLD decision. |\n| `video_poker_reward` | Raw metric emitted by the rubric (identical to `reward`). |\n\n### Programmatic usage\n\n```python\nfrom videopoker import load_environment\n\nenv = load_environment(num_hands=5, seed=0)\nprint(env.dataset[0][\"prompt\"])  # Inspect the first prompt\n```\n\n## Evaluation Reports\nThis section is reserved for auto-generated evaluation reports.\n","encoding":"utf-8","truncated":false,"total_bytes":3024},"status":null}