{"data":{"kind":"file","path":"README.md","version_id":"beled5zpf7927gdmkrii3eya","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1791,"modified_at":"2025-11-30T10:57:00.148000","content_hash":"5e50d20fbb6f7552ac7cba853ac2df118a5f2ba2aa6231f622a0cc475b93764d"},"entries":[],"content":"# poker-winner\n\n### Overview\n- **Environment ID**: `poker-winner`\n- **Short description**: Identify the winning player out of eight in a text-described Texas Hold'em hand.\n- **Tags**: poker, reasoning, accuracy, chat\n\n### Datasets\n- **Primary dataset(s)**: Synthetic 8-player Texas Hold'em deals with a guaranteed single winner.\n- **Source links**: Generated locally at load time.\n- **Split sizes**: Train/Eval share the same synthetic set (configurable, default 64 examples).\n\n### Task\n- **Type**: single-turn\n- **Parser**: Default `Parser` (chat)\n- **Rubric overview**: Exact-match winner check via a custom reward that extracts the predicted player number.\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval poker-winner -s\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval poker-winner \\\n  -m gpt-4.1-mini \\\n  -n 20 -r 3 -t 1024 -T 0.7 \\\n  -a '{\"num_examples\": 100, \"seed\": 42}'\n```\n\nUsing OpenRouter (note the required `/api/v1` base URL):\n\n```bash\nuv run vf-eval poker_winner \\\n  -m x-ai/grok-4.1-fast:free \\\n  -b https://openrouter.ai/api/v1 \\\n  -k OPENROUTER_API_KEY \\\n  -s -n 5 -r 1\n```\n\nNotes:\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n\n### Environment Arguments\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `num_examples` | int | `64` | Number of synthetic hands to generate. |\n| `seed` | int | `13` | RNG seed for reproducible dealing. |\n| `max_examples` | int | `-1` | Optional cap on dataset size (use -1 for all). |\n\n### Metrics\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | 1.0 for a correct winner prediction, 0.0 otherwise (identical to `accuracy`). |\n| `accuracy` | Exact match on the winning player number extracted from the model response. |\n","encoding":"utf-8","truncated":false,"total_bytes":1791},"status":null}