{"data":{"kind":"file","path":"README.md","version_id":"ksbxi70ycjevleywfsvatvus","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2797,"modified_at":"2025-10-28T02:17:01.835000","content_hash":"94ae5b69a9bae064938579d6644f927e2d2a72acce704f8f42e8658bf86fb351"},"entries":[],"content":"# PuzzleJAX Environment\r\n\r\nA verifiers-compatible environment for PuzzleScript games using [PuzzleJAX](https://github.com/smearle/script-doctor).\r\n\r\nPuzzleJAX is a GPU-accelerated implementation of PuzzleScript that enables efficient training and evaluation of agents on grid-based puzzle games like Sokoban.\r\n\r\n## Setup\r\n\r\n### Installation\r\n\r\nSimply install the environment using the verifiers CLI:\r\n\r\n```bash\r\nuv run vf-install puzzlescript_games\r\n```\r\n\r\nThis will automatically install PuzzleJAX from GitHub along with all dependencies.\r\n\r\n### Windows Users\r\n\r\nFor proper Unicode support in console output, set the encoding:\r\n\r\n```powershell\r\n$env:PYTHONIOENCODING = \"utf-8\"\r\n```\r\n\r\n## Usage\r\n\r\n### Basic Evaluation\r\n\r\n```bash\r\n# Basic evaluation with default Sokoban game\r\nuv run vf-eval puzzlescript_games -n 5 -m gpt-4o-mini\r\n\r\n# Different game and level with rendering\r\nuv run vf-eval puzzlescript_games -n 5 -m gpt-4o-mini --env-args '{\"game_name\":\"sokoban_basic\",\"level_i\":1,\"render_mode\":\"save\"}'\r\n```\r\n\r\n### Available Games\r\n\r\nThe environment supports any PuzzleScript game in PuzzleJAX's game database. Popular games include:\r\n\r\n- `sokoban_basic` - Classic box-pushing puzzle\r\n- `wordle` - Word guessing game  \r\n- `limerick` - Poetry-themed puzzle\r\n- `atlas shrank` - Spatial reasoning puzzle\r\n- `notsnake` - Snake variant\r\n- Many more...\r\n\r\nSee the [PuzzleJAX repository](https://github.com/smearle/script-doctor) for the full list of available games.\r\n\r\n## How It Works\r\n\r\n### Architecture\r\n\r\nThe environment wraps PuzzleJAX's game engine with verifiers' `MultiTurnEnv` interface:\r\n\r\n1. **Game Loading**: Parses PuzzleScript `.txt` files using Lark grammar\r\n2. **ASCII Rendering**: Converts game state to text representation for LLMs\r\n3. **Action Parsing**: Extracts actions from LLM responses using XML tags\r\n4. **State Management**: Maintains JAX game state and synchronizes with verifiers state dict\r\n5. **Reward Calculation**: Multiple reward functions (win, progress, efficiency)\r\n\r\n### LLM Interface\r\n\r\nThe LLM sees ASCII representations like:\r\n\r\n```\r\nAfter moving right:\r\n\r\nLEGEND:\r\n# = wall\r\n@ = player\r\n$ = crate\r\n. = target\r\n* = crate on target\r\n\r\nMAP:\r\n#####\r\n#@$.#\r\n#####\r\n\r\n✓ Good move! You're getting closer to the solution.\r\n\r\nWhat's your next move?\r\n```\r\n\r\nThe LLM responds with:\r\n\r\n```xml\r\nI should push the crate onto the target square.\r\n<action>right</action>\r\n```\r\n\r\n### Reward Functions\r\n\r\nThe environment includes multiple reward signals:\r\n\r\n- **Win Reward** (1.0): For solving the puzzle\r\n- **Progress Reward** (0-0.3): Based on heuristic improvement (distance to goal)\r\n- **Efficiency Reward** (0-0.2): Solving with fewer moves\r\n- **Valid Action Reward** (-0.1 per invalid): Penalty for malformed actions\r\n- **Format Reward** (0-0.2): Using correct XML tags","encoding":"utf-8","truncated":false,"total_bytes":2797},"status":null}