{"data":{"kind":"file","path":"README.md","version_id":"zclk5qmcdwyki6r1qylm5z5o","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2728,"modified_at":"2026-01-19T19:05:28.421000","content_hash":"a63bbe6f68947d2d73984fa4bb23e30d4dbceb3342ada2fbfe41fa37cf1a6ce5"},"entries":[],"content":"# textworld_cooking_env\n\n### Overview\n- **Environment ID**: `textworld_cooking_env`\n- **Short description**: Multi-turn text-based cooking game environment using TWCooking environment; rewards cumulative score across game steps.\n- **Tags**: games, multi-turn, textworld, cooking, interactive-fiction, xml\n\n### Datasets\n- **Primary dataset(s)**: TWCooking dataset from GATA-public (auto-downloaded on first run)\n- **Source links**: [GATA-public releases](https://github.com/xingdi-eric-yuan/GATA-public/releases)\n- **Split sizes**: Train and test splits available at multiple difficulty levels (1-10)\n\n### Difficulties \n\nTextworld cooking env is designed to have a progression of difficulties. As you progress, the agent is required to either include more items in the recipe, navigate more rooms, or prepare each item differently. The table below outlines exactly the difference between difficulities\n\n| Difficulty | Recipe Size (Num Items)  | Num Locations | Max Score |  Need Cut |  Need Cook | \n| --- | ---- | ---- | ---- |---- |---- |\n| 1 | 1 | 1 | 3 | ✗ | ✗ |\n| 2 | 1 | 1 | 4 | ✗ | ✓ |\n| 3 | 1 | 1 | 4 | ✓ | ✗ |\n| 4 | 1 | 6 | 3 | ✗ | ✗ |\n| 5 | 1 | 9 | 3 | ✗ | ✗ |\n| 6 | 1 | 12 | 3 | ✗ | ✗ |\n| 7 | 1 | 1 | 5 | ✓ | ✓ |\n| 8 | 3 | 6 | 5 | ✗ | ✗ |\n| 9 | 3 | 6 | 11 | ✓ | ✓ |\n| 10 | 3 | 12 | 11 | ✓ | ✓ |\n\n### Task\n- **Type**: multi-turn (game interaction)\n- **Parser**: `XMLParser` with `action` field\n- **Rubric overview**: `EpisodicSumRubric` sums step rewards across the game trajectory\n\n### How it works\n- **Data preparation**: Downloads and extracts TWCooking game files to `~/.cache/textworld/tw-cooking/` on first run.\n- **Game setup**: Each episode loads a `.z8` game file with TextWorld, providing observations, walkthroughs, and command templates.\n- **Agent interaction**: The agent receives text observations and must provide actions in `<action></action>` XML tags (e.g., `<action>take knife from counter</action>`).\n- **Scoring**: Rewards are computed as delta scores between steps; final reward is the sum of all step rewards.\n\n### Quickstart\n\nExample eval script\n\n```bash\nvf-eval -n -1 -s -v -k \"<API key>\"\\\n  -b \"http://0.0.0.0:8000/v1\"\\\n  -m \"Qwen/Qwen3-4B-Thinking-2507\"\\ \n  --env-args '{\"max_turns\": 10,\"difficulties\": [1]}'\\\n  textworld_cooking_env\n```\n\n### Environment Arguments\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `difficulties` | List[int] | `[1]` | List of difficulty levels (1-10) to include in dataset |\n| `system_prompt` | str | *(built-in)* | Custom system prompt for the agent |\n\n### Metrics\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Cumulative sum of step rewards (delta scores) across the episode |\n\n","encoding":"utf-8","truncated":false,"total_bytes":2728},"status":null}