{"data":{"kind":"file","path":"README.md","version_id":"lf4b2rz2vo29egwwn1yh1sc6","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":4567,"modified_at":"2025-11-21T23:06:36.877000","content_hash":"bba6650f7d04b5f57b850fa5a799ae0bb0bc2a05b41bf9d8f1568757bca18672"},"entries":[],"content":"# Pacman Environment\r\n\r\nA multimodal Pacman environment for reinforcement learning training with verifiers and prime-rl.\r\n\r\n## Features\r\n\r\n- **Multimodal**: Combines textual game state with visual game frames\r\n- **Stateless Model Context**: Each turn shows only the current observation (no conversation history)\r\n- **Multi-turn Environment**: Environment handles multiple sequential turns internally\r\n- **10x10 Grid**: Fixed maze layout with walls and strategic pellet placement\r\n- **Smart Ghosts**: 2 ghosts using Manhattan distance-based pathfinding\r\n- **Power Pellets**: 4 power pellets that allow eating ghosts for 10 turns\r\n- **Proper Rewards**: Scaled rewards compatible with RL training\r\n- **Action Validation**: Only accepts UP, DOWN, LEFT, RIGHT actions\r\n\r\n## Game Rules\r\n\r\n- **Objective**: Eat all pellets while avoiding ghosts\r\n- **Controls**: UP, DOWN, LEFT, RIGHT (case-insensitive)\r\n- **Scoring**:\r\n  - Pellet: +10 points (+0.1 RL reward)\r\n  - Power pellet: +50 points (+0.5 RL reward)\r\n  - Ghost eaten: +100 points (+1.0 RL reward)\r\n- **Power Mode**: Eating power pellets allows eating ghosts for 10 turns\r\n- **Termination**: Game ends when Pacman is caught, all pellets eaten, or 100 turns reached\r\n\r\n## Installation\r\n\r\n### Local Development\r\n```bash\r\n# Install in editable mode for development\r\nuv pip install -e pacman/\r\n```\r\n\r\n### From Environment Hub\r\n```bash\r\n# Install from Prime Intellect Environment Hub\r\nprime env install primeintellect/pacman\r\n```\r\n\r\n## Usage\r\n\r\n### Basic Testing\r\n```python\r\nimport pacman\r\n\r\n# Create environment\r\nenv = pacman.Environment()\r\n\r\n# Test import\r\nprint(\"Pacman environment loaded successfully!\")\r\n```\r\n\r\n### With Prime-RL\r\n\r\nAdd to your orchestrator config:\r\n```toml\r\n[[env]]\r\nid = \"pacman\"\r\n```\r\n\r\n### Evaluation\r\n```bash\r\n# Evaluate with prime-rl\r\nuv run vf-eval pacman -m your-model-name -b http://localhost:8000/v1 -n 10 --max-tokens 1024\r\n```\r\n\r\n#### Rollout Logging\r\n\r\nWhen you run evaluations, rollouts are automatically saved to disk in the following location:\r\n```\r\noutputs/evals/step_{checkpoint_number}/pacman/\r\n```\r\n\r\nEach rollout contains:\r\n- **Complete game trajectory**: All moves, rewards, and game states\r\n- **Game frames**: PIL Images showing the visual state at each turn (400x400 pixels)\r\n- **Base64 images**: Encoded versions for potential API usage\r\n- **Detailed metadata**: Scores, turn counts, win/loss status, actions taken\r\n\r\nThe saved data includes both the textual game information and visual frames, allowing you to replay and visualize entire game sessions for analysis and debugging.\r\n\r\n## Environment Specifications\r\n\r\n### Observations\r\nEach turn provides:\r\n- **Text**: Pacman position, ghost positions, pellet counts, score, turn count, power timer\r\n- **Image**: 400x400 pixel rendering of the game grid showing:\r\n  - Gray walls\r\n  - White pellets/power pellets\r\n  - Yellow Pacman\r\n  - Red ghosts\r\n\r\n### Actions\r\n- `\"UP\"`, `\"DOWN\"`, `\"LEFT\"`, `\"RIGHT\"` (case-insensitive)\r\n- Invalid actions result in no movement but game continues\r\n\r\n### Rewards\r\n- Pellet eaten: +0.1\r\n- Power pellet eaten: +0.5\r\n- Ghost eaten: +1.0\r\n- Invalid action: 0.0\r\n\r\n### Episode Termination\r\n- Pacman caught by ghost\r\n- All pellets collected (win condition)\r\n- Maximum 100 turns reached\r\n\r\n## Architecture\r\n\r\n- **`environment.py`**: Main Environment class implementing verifiers interface\r\n- **`game.py`**: Core game logic, state management, and AI\r\n- **`renderer.py`**: Image generation and base64 encoding for multimodal observations\r\n- **`utils.py`**: Helper functions for action parsing and message formatting\r\n\r\n## Development\r\n\r\n### Running Tests\r\n```bash\r\n# Install test dependencies\r\nuv pip install pytest\r\n\r\n# Run tests\r\ncd pacman\r\nuv run pytest\r\n```\r\n\r\n### Building for Distribution\r\n```bash\r\n# Build package\r\ncd pacman\r\nuv build\r\n\r\n# Upload to hub\r\nprime env upload pacman/\r\n```\r\n\r\n## Configuration\r\n\r\nThe environment accepts the following arguments:\r\n- `seed` (int, optional): Random seed for reproducible gameplay\r\n\r\nExample with custom seed:\r\n```python\r\nenv = pacman.Environment(seed=42)\r\n```\r\n\r\n## Multimodal Integration\r\n\r\nGame frames are automatically converted to base64 and included in OpenAI API calls. The system prompt explains the valid actions and game rules. Unlike traditional multi-turn environments, Pacman uses a **stateless approach** where each turn contains only the current observation - the system prompt and the latest game state (text + image). This is sufficient since Pacman is a stateless game where all necessary information is contained in the current frame.\r\n","encoding":"utf-8","truncated":false,"total_bytes":4567},"status":null}