{"data":{"kind":"file","path":"README.md","version_id":"s6ihh6wz74mw80z4ybwilm6y","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3686,"modified_at":"2025-12-01T22:20:34.552000","content_hash":"de2d2839c495826279f4a6e283bd435935678226582e89878c75c4b238f20152"},"entries":[],"content":"# blackjack\n\n### Overview\n- **Environment ID**: `blackjack`\n- **Short description**: Multi-turn Blackjack card game where an LLM agent plays against a dealer, deciding when to HIT or STICK to get as close to 21 as possible without going over.\n- **Tags**: games, multi-turn, cards, reasoning, xml, decision-making\n\n### Datasets\n- **Primary dataset(s)**: Self-generated episodes (no external dataset required)\n- **Source links**: Classic RL environment (OpenAI Gym / Gymnasium)\n- **Split sizes**: Number of episodes controlled via args\n\n### Task\n- **Type**: multi-turn (game interaction)\n- **Parser**: `XMLParser` with `action` field\n- **Rubric overview**: Win reward, no-bust reward, total reward, and format check\n\n### Game Description\n\nThe agent plays Blackjack against a dealer:\n\n**Card Values:**\n- Number cards (2-10): Face value\n- Face cards (J, Q, K): 10\n- Aces: 1 or 11 (whichever is better for the hand)\n\n**Actions (2 total):**\n- `HIT`: Draw another card\n- `STICK`: Stand with current hand and let the dealer play\n\n**Game Flow:**\n1. Player receives 2 cards, dealer shows 1 card (one hidden)\n2. Player can HIT (draw cards) or STICK (stand)\n3. If player goes over 21, they BUST and lose immediately\n4. When player STICKs, dealer reveals hidden card and plays\n5. Dealer must HIT on 16 or below, STICK on 17+\n6. If dealer busts, player wins\n7. Otherwise, higher total wins (tie is a push/draw)\n\n**Rewards:**\n- Win: **+1.0**\n- Loss: **-1.0**\n- Draw/Push: **0.0**\n- Bust: **-1.0**\n\n### Quickstart\n\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval blackjack\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval blackjack \\\n  -m gpt-4.1-mini \\\n  -n 20 -r 3 -t 1024 -T 0.7 \\\n  -a '{\"num_train_examples\": 1000, \"num_eval_examples\": 20, \"natural\": false}'\n```\n\n### Environment Arguments\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `num_train_examples` | int | `1000` | Number of training episodes |\n| `num_eval_examples` | int | `20` | Number of evaluation episodes |\n| `natural` | bool | `false` | Whether natural blackjack (21 on first 2 cards) pays 1.5x |\n| `max_steps` | int | `20` | Maximum steps per episode |\n\n### Metrics\n| Metric | Meaning |\n| ------ | ------- |\n| `_win_reward_func` | 1.0 if player wins, 0.0 for draw, -0.5 for loss |\n| `_no_bust_reward_func` | 1.0 if player doesn't bust (stays at or below 21) |\n| `_total_reward_func` | Normalized game reward mapped from [-1, 1] to [0, 1] |\n| `format_reward` | Adherence to expected XML format (weight 0.1) |\n\n### Example Interaction\n\n**System prompt** instructs the agent on rules, card values, and format.\n\n**Initial state**:\n```\n=== Blackjack Game State ===\nYour Hand: 7, 8\nYour Total: 15\nDealer Showing: 10 (Value: 10)\nSteps taken: 0\n============================\n```\n\n**Agent response**:\n```\nI have a 7 and 8 for a total of 15. The dealer shows a 10, which is strong.\nWith 15 against a dealer's 10, the dealer likely has 17-20, so I need to improve.\nIf I hit, I bust with any card 7 or higher (7,8,9,10,J,Q,K,A = 8/13 chance).\nBut standing on 15 against a 10 is also likely to lose.\nI'll take the risk and hit.\n<action>HIT</action>\n```\n\n### Strategy Tips\n\n1. **Hard totals**: If your total is 17+, usually STICK. If 11 or less, always HIT (can't bust).\n2. **Soft hands**: With a usable ace, you have more flexibility to hit without busting.\n3. **Dealer's card matters**: \n   - Dealer shows 7+: They likely have a strong hand (17-20)\n   - Dealer shows 4-6: They're more likely to bust (must hit on 16 or below)\n4. **Basic strategy**: \n   - Hit on 12-16 if dealer shows 7+\n   - Stand on 12-16 if dealer shows 2-6\n   - Always hit on 11 or less\n   - Always stand on 17+\n","encoding":"utf-8","truncated":false,"total_bytes":3686},"status":null}