{"data":{"kind":"file","path":"README.md","version_id":"pjsfksg6zqc35wd37x713ojt","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3601,"modified_at":"2026-02-01T03:13:59.560000","content_hash":"c9606c56a6f7b0bf7b99efd9da6d38a78c45fdba11414062c9c9c5794ff44dcb"},"entries":[],"content":"# Ghost Fixer\r\n\r\nProcedurally generated debugging environment for training code repair agents.\r\n\r\n## Overview\r\n\r\n- **Environment ID**: `ghost-fixer`\r\n- **Type**: Multi-turn tool-use environment (SandboxEnv)\r\n- **Tags**: `debugging`, `procedural`, `sandbox`, `train`, `eval`\r\n\r\nGhost Fixer creates **infinite, unique** Python repositories with a single injected bug (\"Ghost\"). Unlike static datasets, each repository is procedurally generated at runtime using a configurable DAG-based algorithm. The agent must explore the codebase, trace execution, and patch the bug to make pytest pass.\r\n\r\n## Key Features\r\n\r\n- **Infinite Dataset**: New repositories generated per rollout using seed-based determinism\r\n- **Parametric Difficulty**: 4 levels controlling codebase depth, breadth, and noise\r\n- **Absolute Verifiability**: Success = pytest exit code 0 (no LLM judges)\r\n- **Context Economy**: Codebase designed to exceed context windows, forcing tool use\r\n\r\n## Installation\r\n\r\n```bash\r\nprime env install ghost-fixer\r\n```\r\n\r\n## Quick Start\r\n\r\n```bash\r\n# Evaluate with Level 2 difficulty\r\nprime eval run ghost-fixer -m gpt-4.1-mini -n 20 -a '{\"level\": 2}'\r\n\r\n# Train on Level 1 (easiest)\r\nprime eval run ghost-fixer -m my-model -a '{\"level\": 1, \"n_samples\": 1000}'\r\n```\r\n\r\n## Difficulty Levels\r\n\r\n| Level | Depth | Files | Noise | Bug Types | Description |\r\n|-------|-------|-------|-------|-----------|-------------|\r\n| 1 | 1 | 1 | 0 | Constant drift | Single file, direct fix |\r\n| 2 | 3 | ~5 | 0 | Operator/constant | Multi-file tracing |\r\n| 3 | 5 | ~20 | 5 | +Off-by-one | Noise file distractors |\r\n| 4 | 8 | ~100 | 20 | +Logic inversion | Long-horizon reasoning |\r\n\r\n## Agent Tools\r\n\r\n| Tool | Description |\r\n|------|-------------|\r\n| `run_tests()` | Execute pytest, returns pass/fail + traceback |\r\n| `list_files(path)` | Directory listing |\r\n| `read_file(path)` | File contents with line numbers |\r\n| `edit_file(path, old, new)` | String replacement with syntax validation |\r\n\r\n## Reward Structure\r\n\r\n| Condition | Reward |\r\n|-----------|--------|\r\n| Tests pass | +10.0 |\r\n| Timeout | -1.0 |\r\n| Crash (broke syntax) | -5.0 |\r\n\r\n## Environment Arguments\r\n\r\n| Arg | Type | Default | Description |\r\n|-----|------|---------|-------------|\r\n| `level` | int | 2 | Difficulty level (1-4) |\r\n| `n_samples` | int | 100 | Number of problems |\r\n| `max_turns` | int | 20 | Turn limit |\r\n| `seed` | int | 42 | Random seed |\r\n\r\n## How It Works\r\n\r\n1. **World Generator**: Creates a DAG of Python modules with math function chains\r\n2. **Ghost Injector**: Mutates one function (operator flip, constant drift, etc.)\r\n3. **Agent**: Uses tools to explore, hypothesize, and patch\r\n4. **Verifier**: pytest determines success/failure\r\n\r\n```\r\nmain.py → utils/helpers.py → core/level0/ops_0.py\r\n                              ↑ BUG HERE\r\n```\r\n\r\n## Example Session\r\n\r\n```\r\nAgent: run_tests()\r\n> ❌ Tests failed: AssertionError: Expected 50, got 42\r\n\r\nAgent: list_files(\".\")\r\n> main.py  utils/  core/  test_main.py\r\n\r\nAgent: read_file(\"main.py\")\r\n> from utils.helpers import transform_2_0_0\r\n> def compute(x):\r\n>     return transform_2_0_0(x) - 1\r\n\r\nAgent: read_file(\"utils/helpers.py\")\r\n> ...\r\n\r\nAgent: edit_file(\"core/level0/ops_0.py\", \"return x - 5\", \"return x + 5\")\r\n> ✅ Successfully replaced\r\n\r\nAgent: run_tests()\r\n> ✅ All tests passed!\r\n```\r\n\r\n## Metrics\r\n\r\n| Metric | Meaning |\r\n|--------|---------|\r\n| `reward` | Final reward (weighted sum) |\r\n| `_test_passed_reward` | 10.0 if fixed, -1.0/-5.0 otherwise |\r\n| `_files_read_metric` | Number of files read |\r\n| `_edits_made_metric` | Number of edits attempted |\r\n","encoding":"utf-8","truncated":false,"total_bytes":3601},"status":null}