{"data":{"kind":"file","path":"README.md","version_id":"hu9ana6egub6jr5phzc85tlr","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":5054,"modified_at":"2025-09-09T18:31:55.294000","content_hash":"15a2982c6e5855d16499e4341e3f87009a77c654f76d9eef837afd4723cd4e4d"},"entries":[],"content":"# poker\n\n### Overview\n- **Environment ID**: `poker`\n- **Short description**: Single-turn poker action selection. The model must return only a JSON object with an `action` key: `fold`, `call`, or `raise`.\n- **Tags**: poker, game, single-turn, json, simulation-optional\n\n### Datasets\n- **Primary dataset**: Synthetic, in-memory dataset of repeated prompts built at runtime via `datasets.Dataset.from_list`.\n- **Source links**: N/A (generated locally)\n- **Size**: Controlled by `dataset_size` (default 200 rows)\n\n### Task\n- **Type**: single-turn\n- **Parser**: Custom `PokerJSONParser` (extracts/validates JSON `{\"action\": ...}`)\n- **Rubric overview**:\n  - Format reward (weight 0.2): 1.0 if the output is valid JSON with an allowed action; 0.0 otherwise.\n  - Poker reward (weight 0.8): If `llm_poker` is available and `use_llm_poker=true`, simulates a hand against either deterministic, rule-based opponents (default) or LLM-driven opponents (`opponent_mode: \"llm\"`). Chip delta is mapped to [0, 1]. Otherwise, falls back to a shaped stub: `fold`→0.45, `call`→0.50, `raise`→0.55.\n\n### Quickstart\nRun with defaults:\n\n```bash\nuv run vf-eval poker\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval poker \\\n  -m gpt-4.1-mini \\\n  -n 20 -r 3 -t 1024 -T 0.7 \\\n  -a '{\"dataset_size\": 50, \"use_llm_poker\": false}'\n```\n\nEnable real simulation (optional): ensure a compatible `llm_poker` package/module is available, then:\n\n```bash\nuv run vf-eval poker \\\n  -a '{\"use_llm_poker\": true, \"num_opponents\": 3, \"opp_probs\": [0.3, 0.5, 0.2], \"seed\": 123}'\n```\n\nNotes:\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n- When `llm_poker` is not installed or errors, the environment automatically falls back to the stub reward.\n\nInstall simulation extra (recommended for sim mode):\n\n```bash\n# Using pip\npip install -e '.[sim]'\n\n# Or with uv's pip shim\nuv pip install -e '.[sim]'\n```\n\nNotes on the simulator dependency:\n- The Python import path is `llm_poker` (underscore). Our `sim` extra installs the `llm_poker` distribution from PyPI.\n- If you prefer installing it directly, run `pip install llm_poker`.\n- If PyPI is unavailable or outdated, install from source: `pip install git+https://github.com/strangeloopcanon/llm-poker`.\n\nOnline LLM opponents (optional):\n\n```bash\nuv run vf-eval poker \\\n  -a '{\n    \"use_llm_poker\": true,\n    \"opponent_mode\": \"llm\",\n    \"opponent_models\": [\"openai:gpt-4o-mini\"],\n    \"num_opponents\": 2\n  }'\n```\n\nThis requires configuring the `llm` Python package used by `llm_poker` with providers/keys. See `llm` docs for installing providers (e.g., `llm install openai`) and setting credentials.\n\nLLM provider setup (OpenAI example):\n\n```bash\n# Install the OpenAI provider plugin for `llm`\nllm install openai          # or: pip install llm-openai\n\n# Add your API key (stored by `llm`)\nllm keys set openai\n# Or via environment variable\nexport OPENAI_API_KEY=sk-...\n\n# Verify available model aliases\nllm models\n\n# Use the printed model names in opponent_models, e.g.:\n#   \"opponent_models\": [\"openai:gpt-4o-mini\"]\n```\n\nNotes:\n- Model aliases vary by plugin/version. Prefer the prefix form `provider:model` (e.g., `openai:gpt-4o-mini`) if plain names don’t resolve.\n- For other providers (e.g., Anthropic), install their plugin (e.g., `llm install anthropic`), set keys (`llm keys set anthropic`), and use the corresponding model names.\n\n### Environment Arguments\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `dataset_size` | int | `200` | Number of repeated prompt rows. |\n| `starting_stack` | int | `10000` | Player starting chips. |\n| `small_blind` | int | `50` | Small blind size. |\n| `big_blind` | int | `100` | Big blind size. |\n| `min_raise` | int | `500` | Default raise amount used when action is `raise`. |\n| `num_opponents` | int | `2` | Number of rule-based opponents in sim mode (>=1). |\n| `use_llm_poker` | bool | `true` | If true, attempt real simulation; otherwise use shaped stub. |\n| `seed` | int | `0` | RNG seed for determinism (deck/opponents). |\n| `opp_probs` | tuple/list[float, float, float] | `[0.2, 0.6, 0.2]` | Opponent mix `(p_fold, p_call, p_raise)` in sim mode. |\n| `opponent_mode` | str | `\"offline\"` | Opponent type: `offline` (deterministic rules) or `llm` (LLM-driven). |\n| `opponent_models` | list[str] | `[]` | Model IDs for LLM opponents (cycled if fewer than `num_opponents`). |\n| `system_prompt` | str | `\"Return only a JSON object. No commentary.\"` | System message passed to the model. |\n\nAdditional kwargs are forwarded to `vf.SingleTurnEnv`.\n\n### Metrics\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Weighted sum of rubric components (default weights: 0.2 format, 0.8 poker). |\n| `format_reward` | 1.0 if output is valid JSON with allowed action; else 0.0. |\n| `poker_reward` | Mapped to [0,1] from chip delta via `0.5 + 0.5 * tanh(delta / (10 * big_blind))`; or stub value if sim disabled/unavailable. |\n\n### Evaluation Reports\nThis section is reserved for auto-generated evaluation artifacts. Do not edit.\n","encoding":"utf-8","truncated":false,"total_bytes":5054},"status":null}