{"data":{"kind":"file","path":"README.md","version_id":"aye0cai3ije7lz4czzlbes7p","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3991,"modified_at":"2025-08-25T17:49:34.520000","content_hash":"09b7471ac0d11d8003f319a94b6ade96ab72481dc372ed67d7fdfd6d57a71829"},"entries":[],"content":"# Prime Arcade RL Environments\r\n\r\n### Overview\r\n- **Project**: Prime Arcade RL — LLM-Guided Reward Design\r\n- **Short description**: A collection of classic game environments designed for research into LLM-generated reward functions for reinforcement learning agents.\r\n- **Tags**: `reinforcement-learning`, `reward-design`, `llm`, `classic-games`, `pygame`, `python`\r\n\r\n### Dependencies\r\nThe environments are self-contained and only require `pygame`.\r\n```bash\r\npip install pygame\r\n```\r\n\r\n### How it Works\r\nThis project uses a unified approach:\r\n1.  **Game Engine**: Each environment is a self-contained `pygame` application. This removes external dependencies and provides full control over the game logic, state, and rendering.\r\n2.  **Rendering**: Pygame is used to create and manage the game window, allowing for a consistent look and feel, including the ability to overlay live training metrics directly onto the game screen. The main training loop is responsible for handling Pygame events (like closing the window).\r\n3.  **Reward Engine**: A custom `RewardEngine` interprets a JSON configuration to calculate rewards based on in-game events, allowing for flexible and dynamic reward shaping experiments.\r\n\r\n### Available Environments\r\nThis suite contains several game environments, each chosen to test different aspects of RL agent performance.\r\n\r\n| Environment  | ID            | Dynamics                      | Key Challenge                      |\r\n|--------------|---------------|-------------------------------|------------------------------------|\r\n| **Pong**     | `Pong-v0`     | Symmetric, 2-player           | Fast reaction, opponent prediction |\r\n| **Breakout** | `Breakout-v0` | Single-player, block breaking | Sparse rewards, long-term strategy |\r\n| **CartPole** | `CartPole-v0` | Classic control, balancing    | Dense reward shaping, stability    |\r\n\r\n### Task\r\n- **Type**: Reinforcement Learning\r\n- **Parser**: The reward signal is parsed by a custom `RewardEngine` based on a JSON specification.\r\n- **Rubric overview**: The core of the experiment is to compare agent performance under different reward rubrics: hand-crafted, sparse, and LLM-generated. Key metrics include sample efficiency, final score, and training stability.\r\n\r\n### Quickstart\r\nThe primary way to run an experiment is through the Go-based TUI.\r\n\r\n1.  **Navigate to the project root directory.**\r\n2.  **Run the TUI batch script:**\r\n    ```bash\r\n    .\\run-tui.bat\r\n    ```\r\n3.  **Follow the on-screen menus** to select a game, agent, and reward mode. The TUI will then execute the Python training script with the correct configuration.\r\n\r\n### Environment Arguments\r\nEnvironment-specific settings (like opponent difficulty in Pong) are configured within the Python scripts, typically managed by the `ExperimentConfig` object. The TUI provides a high-level interface to select pre-configured experiment conditions.\r\n\r\n### Metrics\r\nThe training script logs several key metrics to `results.csv` and displays live plots.\r\n\r\n| Metric              | Meaning                                                                         |\r\n|---------------------|---------------------------------------------------------------------------------|\r\n| `episode_reward`    | The total reward accumulated in an episode. The primary measure of performance. |\r\n| `final_score`       | The game's actual score, independent of shaping rewards.                        |\r\n| `epsilon`           | The agent's current exploration rate.                                           |\r\n| `stability`         | Measured by the variance of returns across training episodes.                   |\r\n| `sample_efficiency` | The number of episodes required to reach a target score.                        |\r\n\r\n## Evaluation Reports\r\n\r\n<!-- This section can be used for summaries or links to generated plots and log files. -->\r\n<p>No reports generated yet. Run an experiment using <code>run-tui.bat</code> to produce logs and checkpoints.</p>\r\n","encoding":"utf-8","truncated":false,"total_bytes":3991},"status":null}