{"data":{"kind":"file","path":"README.md","version_id":"s3z8hgqe3nwl72rh0hjyst43","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2818,"modified_at":"2025-08-27T12:36:37.926000","content_hash":"f4e025dccfc36321fed5e68e6d775d03c5c231a913f5cd300c29ad1e80565e1f"},"entries":[],"content":"# hud-text-2048\n\n### Overview\n- **Environment ID**: `hud-text-2048`\n- **Short description**: Text-based 2048 game for agents to reach target tiles through strategic moves\n- **Tags**: `game`, `strategy`, `text`, `2048`\n\n### Datasets\n- **Primary dataset(s)**: Built-in tasks with varying difficulty (64 to 512 tiles)\n- **Source links**: Generated programmatically in environment loader\n- **Split sizes**: 4 tasks (you can adjust these)\n\n### Task\n- **Type**: multi-turn, tool use\n- **Parser**: ToolXMLParser with action validation\n- **Rubric overview**: Task completion (80%), format compliance (15%), tool execution (5%)\n\n### Quickstart\n\nMake sure to export your `HUD_API_KEY` (you can generate one [here](https://app.hud.so).\n\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval hud-text-2048\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval hud-text-2048 \\\n  -m gpt-4.1-mini \\\n  -n 1 -r 3 \\\n  -a '{\"max_turns\": 150}'  # env-specific args as JSON\n```\n\nNotes:\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n\n### Environment Arguments\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `max_turns` | int | `100` | Maximum moves allowed per game |\n| `system_prompt` | str | (see config) | Override agent instructions |\n\n### Metrics\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Main scalar reward (weighted sum of task completion, format compliance, tool execution) |\n| `task_completion` | Logarithmic scale based on highest tile vs target (min(1.0, log(highest)/log(target))) |\n| `tool_execution` | Ratio of successful tool calls |\n| `format_compliance` | Score for correct XML formatting and action syntax |\n\n### How It Works\n\nThis environment uses `hud-vf-gym`, an adapter that bridges HUD's MCP infrastructure with the Verifiers RL framework:\n\n1. **Environment Loader** (`hud_text_2048.py`):\n   - Defines 6 tasks with increasing difficulty (64 to 2048 tiles)\n   - Creates a Verifiers-compatible dataset with prompts and MCP configurations\n   - Points to the `hudevals/hud-text-2048` Docker image\n\n2. **MCP Integration** (via `hud-vf-gym`):\n   - Spawns Docker container running the text-2048 MCP server\n   - Executes setup tools to initialize game state\n   - Translates agent actions to MCP tool calls\n   - Returns text-based observations after each move\n   - Runs evaluation tools to compute rewards\n\n3. **Action Mapping** (`config.yaml`):\n   - Maps high-level agent actions (`left()`, `right()`) to MCP's `move` tool\n   - Uses declarative YAML configuration\n\n### Relevant Links\n\n- [HUD Documentation](https://docs.hud.so)\n- [Build Your Own Environment](https://docs.hud.so/build-environments)\n- [HUD Python SDK](https://github.com/hud-evals/hud-python)\n- [HUD VF Gym Adapter](https://github.com/hud-evals/hud-vf-gym)","encoding":"utf-8","truncated":false,"total_bytes":2818},"status":null}