{"data":{"kind":"file","path":"README.md","version_id":"oo3evxrmx4pcs9x7c8j907en","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":9897,"modified_at":"2025-11-20T19:33:05.732000","content_hash":"22beb075dc495a8d5ed33a1021360f5d2d97fc9c43c8626cdf5f508d3e8df01c"},"entries":[],"content":"# Balatro RL Environment\n\nA Gymnasium-compatible Reinforcement Learning environment for Balatro, optimized for use with the [Prime Intellect Environment Hub](https://www.primeintellect.ai/blog/environments).\n\n## Overview\n\nThis environment simulates the poker roguelike game Balatro, where players must strategically play poker hands to reach target scores across multiple rounds (blinds). The environment follows the Gymnasium API standard and is designed for training RL agents.\n\n## Features\n\n- **Gymnasium-Compatible**: Full implementation of the Gymnasium environment interface\n- **Rich State Space**: Includes hand cards, game resources, and scoring information\n- **Flexible Actions**: Support for playing and discarding cards\n- **Poker Hand Evaluation**: Accurate evaluation of poker hands with Balatro-specific rules\n- **Joker System**: Framework for joker effects (see [DATA_SOURCES.md](DATA_SOURCES.md) for getting full joker data)\n- **Progression System**: Antes, blinds, and increasing difficulty\n- **Prime Intellect Ready**: Packaged for easy integration with Prime Intellect Environment Hub\n\n## ⚠️ Important: Card Data\n\n**The current implementation includes a simplified card system with example jokers.** \n\nTo get the **full Balatro card data** including all jokers, tarot cards, and other items, see **[DATA_SOURCES.md](DATA_SOURCES.md)** for detailed instructions on:\n- Extracting data from community sources (Balatro HQ, Wiki, etc.)\n- Using the data loader utilities\n- Formatting and importing card data\n\n## Installation\n\n### Local Installation\n\n```bash\n# Clone the repository\ngit clone <your-repo-url>\ncd balatro\n\n# Install in development mode\npip install -e .\n```\n\n### Using Prime Intellect CLI\n\n```bash\n# Install Prime CLI (if not already installed)\npip install prime-cli\n\n# Install the environment from the hub\nprime env install balatro-env\n\n# Or upload your local version\nprime env upload .\n```\n\n## Quick Start\n\n### Basic Usage\n\n```python\nimport gymnasium as gym\nimport balatro_env\n\n# Create the environment\nenv = gym.make(\"Balatro-v0\", render_mode=\"human\")\n\n# Reset the environment\nobservation, info = env.reset(seed=42)\n\n# Run a simple episode\nterminated = False\ntruncated = False\ntotal_reward = 0\n\nwhile not (terminated or truncated):\n    # Sample a random action\n    action = env.action_space.sample()\n    \n    # Take a step\n    observation, reward, terminated, truncated, info = env.step(action)\n    total_reward += reward\n    \n    # Render the environment\n    env.render()\n\nprint(f\"Episode finished with total reward: {total_reward}\")\nenv.close()\n```\n\n### Loading Joker Data\n\n```python\nfrom balatro_env.joker import JokerDatabase\nfrom balatro_env.data_loader import BalatroDataLoader\n\n# Load jokers from data file\nloader = BalatroDataLoader()\njokers_data = loader.load_jokers()\n\n# Use joker database\njoker_db = JokerDatabase()\nall_jokers = joker_db.get_all_jokers()\n\n# Get specific joker\njoker = joker_db.get_joker(\"Joker\")\n```\n\n### Creating Data Templates\n\n```bash\n# Create template files for manual data entry\npython scripts/fetch_balatro_data.py templates\n\n# Validate your data\npython scripts/fetch_balatro_data.py validate\n\n# See instructions for getting data\npython scripts/fetch_balatro_data.py instructions\n```\n\n## Environment Details\n\n### Observation Space\n\nThe observation is a dictionary with two components:\n\n1. **hand**: A (8, 4) array representing the 8 cards in hand\n   - Column 0: Rank (2-14, where 11=Jack, 12=Queen, 13=King, 14=Ace)\n   - Column 1: Suit (0=Hearts, 1=Diamonds, 2=Clubs, 3=Spades)\n   - Column 2: Enhancement (0=None, 1=Bonus, 2=Mult, etc.)\n   - Column 3: Edition (0=None, 1=Foil, 2=Holographic, 3=Polychrome)\n\n2. **game_state**: A (7,) array with:\n   - Money\n   - Hands remaining\n   - Discards remaining\n   - Current score\n   - Target score\n   - Round number\n   - Ante\n\n### Action Space\n\nThe action is a dictionary with two components:\n\n1. **action_type**: Discrete(2)\n   - 0: Play selected cards\n   - 1: Discard selected cards\n\n2. **card_selection**: MultiBinary(8)\n   - Binary mask indicating which cards to play/discard\n   - Example: [1, 1, 0, 0, 0, 0, 0, 0] selects first two cards\n\n### Rewards\n\n- **Playing cards**: Reward proportional to score achieved (score / 100)\n- **Good poker hands**: Bonus rewards for Flush, Full House, Four of a Kind, etc. (+5)\n- **Defeating blind**: Large bonus (+50)\n- **Invalid actions**: Small penalty (-1)\n- **Losing**: Large penalty (-100)\n- **Running out of hands**: Penalty (-50)\n\n### Episode Termination\n\nAn episode terminates when:\n- The player fails to beat a blind (game over)\n- The maximum number of rounds is reached\n- The player runs out of hands without beating the blind\n\n## Game Rules (Simplified)\n\n1. **Objective**: Score enough chips to beat each blind's target score\n2. **Resources**: Limited hands (plays) and discards per blind\n3. **Poker Hands**: Standard poker hands with Balatro-specific additions:\n   - Flush Five (5 of a kind in same suit)\n   - Flush House (Full House in same suit)\n4. **Jokers**: Modify chips and multipliers (see [DATA_SOURCES.md](DATA_SOURCES.md))\n5. **Progression**: Difficulty increases with each ante\n6. **Rewards**: Earn money for beating blinds (future: spend in shop)\n\n## Training Strategies\n\n### Curriculum Learning\nThe environment includes utilities for curriculum learning to address the sparse reward problem:\n\n- **Biased card draws**: Start training with flush-friendly or straight-friendly hands\n- **Progressive difficulty**: Gradually increase to random draws\n- See `examples/curriculum_learning_example.py` for usage\n\n### Action Space Pruning\nReduce the action space from 512 to ~10-20 promising actions:\n\n- **Scoring hands**: Pre-select best card combinations for each hand type\n- **Chase hands**: Identify cards to discard to improve toward target hands\n- See `balatro_env/action_pruner.py` for implementation\n\n### Training Insights\nSee [TRAINING_INSIGHTS.md](TRAINING_INSIGHTS.md) for:\n- Problems identified (sparse rewards, action complexity)\n- Proposed solutions (curriculum learning, supervised pre-training, etc.)\n- References to related repositories\n\n## Project Structure\n\n```\nbalatro/\n├── balatro_env/\n│   ├── __init__.py          # Environment registration\n│   ├── balatro_env.py       # Main Gymnasium environment\n│   ├── card.py              # Card and deck classes\n│   ├── poker_hands.py       # Poker hand evaluation\n│   ├── game_state.py        # Game state management\n│   ├── joker.py             # Joker system\n│   ├── action_pruner.py     # Action space pruning utilities\n│   ├── curriculum.py        # Curriculum learning utilities\n│   ├── data_loader.py       # Data loading utilities\n│   └── data/                # Card data files (JSON)\n│       ├── jokers.json      # Joker definitions\n│       ├── cards.json       # Card enhancements/editions\n│       └── blinds.json      # Blind definitions\n├── examples/\n│   ├── random_agent.py      # Random agent example\n│   └── train_ppo.py         # PPO training example\n├── scripts/\n│   └── fetch_balatro_data.py # Data fetching utilities\n├── tests/\n│   └── test_env.py          # Unit tests\n├── pyproject.toml           # Package configuration\n├── README.md                # This file\n├── DATA_SOURCES.md          # How to get real Balatro data\n└── .gitignore              # Git ignore file\n```\n\n## Getting Real Balatro Data\n\nThe environment includes a framework for jokers and cards, but you need to provide the actual data. See **[DATA_SOURCES.md](DATA_SOURCES.md)** for:\n\n- Community resources (Balatro HQ, Wiki, etc.)\n- Data extraction methods\n- JSON format specifications\n- Scripts to help organize data\n\nQuick start:\n```bash\npython scripts/fetch_balatro_data.py templates\npython scripts/fetch_balatro_data.py instructions\n```\n\n## Development\n\n### Running Tests\n\n```bash\n# Install development dependencies\npip install -e \".[dev]\"\n\n# Run tests\npytest tests/\n```\n\n### Code Quality\n\n```bash\n# Format code\nblack balatro_env/\n\n# Lint code\nflake8 balatro_env/\n\n# Type checking\nmypy balatro_env/\n```\n\n## Prime Intellect Environment Hub Integration\n\nThis environment is designed to work seamlessly with the Prime Intellect Environment Hub:\n\n1. **Standard Interface**: Follows Gymnasium API for compatibility\n2. **Packaged Module**: Includes `pyproject.toml` for easy installation\n3. **Scalable**: Works with distributed RL trainers\n4. **Reproducible**: Supports seeding for deterministic episodes\n\n### Uploading to Prime Intellect Hub\n\n```bash\n# Make sure you're logged in\nprime login\n\n# Upload the environment\nprime env upload . --name balatro-env --description \"Balatro RL Environment\"\n\n# View your environment on the hub\nprime env list\n```\n\n## Future Enhancements\n\n- [x] Joker system framework\n- [ ] Full joker data integration\n- [ ] Shop system with purchasing items\n- [ ] Tarot cards and planet cards\n- [ ] Boss blind special effects\n- [ ] Graphical rendering (pygame)\n- [ ] More sophisticated reward shaping\n- [ ] Pre-trained baseline models\n- [ ] Tournament evaluation mode\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\nIf you have extracted Balatro card data, consider sharing it to help improve the environment!\n\n## License\n\nMIT License - see LICENSE file for details\n\n## Acknowledgments\n\n- Inspired by [Balatro](https://www.playbalatro.com/) by LocalThunk\n- Built with [Gymnasium](https://gymnasium.farama.org/)\n- Designed for [Prime Intellect Environment Hub](https://www.primeintellect.ai/)\n\n## Citation\n\nIf you use this environment in your research, please cite:\n\n```bibtex\n@software{balatro_env,\n  title = {Balatro RL Environment},\n  author = {Balatro RL Team},\n  year = {2025},\n  url = {https://github.com/yourusername/balatro-env}\n}\n```\n","encoding":"utf-8","truncated":false,"total_bytes":9897},"status":null}