{"data":{"kind":"file","path":"README.md","version_id":"po9kkeql55nuibpgbk0dui4p","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2607,"modified_at":"2025-12-28T19:07:52.332000","content_hash":"46ebd9c90512e521e9128e89c179279cb51056dedeff144323acbfc10927168f"},"entries":[],"content":"# CA Gas Environment\n\nA reinforcement learning environment for training LLMs to maximize cellular automata activity.\n\n## Task\n\nThe agent interacts with a Game of Life simulation over N generations. Each turn, the agent can flip 0-2 cells. The goal is to maximize the total number of cell state changes while keeping the pattern contained within reasonable bounds.\n\n## Reward Function\n\n```\nreward = total_cells_changed / final_bounds_area\n```\n\nWhere:\n- `total_cells_changed` = sum of all cells that changed state across all generations (from CA rules + agent flips)\n- `final_bounds_area` = area of the bounding box containing all cells that were ever alive\n\nThis encourages:\n- **High activity** (\"gas-like\" dynamics with lots of state changes)\n- **Contained patterns** (runaway gliders expand bounds, reducing reward density)\n\n## Tools\n\nThe agent has access to one tool:\n\n```python\ndef flip_cell(x: int, y: int) -> str:\n    \"\"\"Toggle a cell's state at position (x, y).\"\"\"\n```\n\nThe agent can call this 0-2 times per generation.\n\n## Observation Format\n\nThe agent receives an ASCII grid representation:\n\n```\nGeneration: 42/500\nPopulation: 47 | Changes: 23 | Bounds: 156\n\n································\n···············█················\n··············███···············\n·············█·█·█··············\n··············███···············\n···············█················\n································\n```\n\n## Installation\n\n```bash\n# From Prime Environments Hub\nprime env install hypergraph/ca-gas\n\n# Or local development\nuv pip install -e .\n```\n\n## Evaluation\n\n```bash\n# Quick test\nuv run vf-eval ca-gas -m gpt-4o-mini -n 5 -r 1\n\n# Full evaluation\nuv run vf-eval ca-gas \\\n  -m gpt-4o \\\n  -n 100 -r 3 \\\n  -a '{\"grid_size\": 64, \"num_generations\": 500, \"max_flips_per_turn\": 2}'\n```\n\n## Configuration\n\n| Parameter | Default | Description |\n|-----------|---------|-------------|\n| `grid_size` | 64 | Size of the observation viewport |\n| `num_generations` | 500 | Episode length |\n| `max_flips_per_turn` | 2 | Max cell flips per generation |\n| `reward_formula` | `\"changes / area\"` | Custom reward formula |\n\n## Integration with Soft-Machine\n\nThis environment is designed to integrate with the Soft-Machine platform. The reward function can be configured via the RL Agent panel in the browser UI.\n\nFuture versions will use the optimized Hashlife CA engine for faster simulation.\n\n","encoding":"utf-8","truncated":false,"total_bytes":2607},"status":null}