{"data":{"kind":"file","path":"README.md","version_id":"eml2cws65e0mk0s57csa07h0","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1876,"modified_at":"2026-05-06T10:18:05.134000","content_hash":"6c859571df5fc4d4dae596fd6d0b2abfb02be20c9cd3f81ad813ee3eb533b3e8"},"entries":[],"content":"# clbench-poker\n\nContinual Learning Bench's `exploitable_poker` task wrapped as a\n[verifiers](https://github.com/willccbb/verifiers) `MultiTurnEnv`, suitable\nfor Prime Intellect Hosted Training.\n\nThe agent plays heads-up Texas Hold'em against a deterministic exploitable\nopponent (default `calling_station`). Reward is the per-hand chip profit\ndivided by the big blind. Continual-learning value comes from learning the\nopponent's pattern over a sequence of hands within a single rollout.\n\n## Args (passed via `[[env]].args` in the training TOML)\n\n| Arg | Default | Notes |\n|---|---|---|\n| `task_kwargs` | `{num_instances=5, opponent_policy=\"calling_station\", seed=0}` | Forwarded to CLBench's `Poker` constructor. |\n| `max_instances_per_rollout` | `1` | Set ≥ 2 to enable continual mode; required for `use_notepad`. |\n| `use_notepad` | `false` | Adds an `icl_notepad`-style `notepad_update` field to the action schema. |\n| `notepad_max_chars` | `4000` | Soft cap; head-truncated when exceeded. |\n| `max_turns` | `16` | Hard cap for the verifiers rollout loop. Cold-start safe; raise once the policy emits valid actions. |\n| `max_input_tokens_per_rollout` | `8000` | Cumulative input-token cap per rollout. Set `0` to disable. Prevents context-quadratic blowup when the policy emits unparseable text. |\n| `parse_failure_penalty` | `-1.0` | Per-failure reward delta. |\n| `end_on_parse_failure` | `true` | Parse failure ends the rollout immediately. Flip to `false` once the policy reliably produces valid JSON. |\n\nSee <https://github.com/sr-networks/clbench-verifiers> for the wrapper source\nand a fuller architecture description.\n\n## Reward\n\n`mean_instance_reward` (mean per-hand chip profit / big blind) plus a\nparse-failure penalty. Diagnostic-only (weight 0) signals on\n`num_instances_completed`, `num_notepad_updates`, and `notepad_length_chars`.\n\n## License\n\nApache-2.0\n","encoding":"utf-8","truncated":false,"total_bytes":1876},"status":null}