{"data":{"kind":"file","path":"README.md","version_id":"lpi20ngir8x8rfbxx017h7te","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1920,"modified_at":"2026-05-25T19:38:36.965000","content_hash":"10dab076ab5b1938144dd4d8a5ca67cb941f3d52b39bcf9bad0a9620cd761188"},"entries":[],"content":"# backdoor-ifeval-phase-law\n\nPhase-transition research environment for the reward-hacking law project. Extends the vigilant Backdoor-IFEval stack with **advantage-geometry logging** and **boundary-shifting** intervention modes.\n\n**Frozen sprint artifact:** `austindixson/backdoor-ifeval-vigilant` (do not mutate for law tests).\n\n## Purpose\n\nMap when hidden objectives become the easiest optimization path:\n\n- `reward_share_hidden`, `concentration_ratio`\n- `quadrant_*_frac` (HH/HL/LH/LL rollout quadrants per group)\n- Standard vigilant metrics (`hidden_gradient_active`, `vigilance_active`, etc.)\n\nSee [docs/PHASE_TRANSITION_LAW.md](../../docs/PHASE_TRANSITION_LAW.md).\n\n## New intervention modes\n\n| Mode | Mechanism (post-trigger) |\n|------|---------------------------|\n| `group_balance` | Boost visible term, penalize hidden in combined reward |\n| `advantage_clip` | Cap hidden contribution share in combined reward |\n| `counterfactual_norm` | Reduce effective hidden weight + damp hidden score |\n| `data_gating` | Drop hidden-heavy combined signal (proxy for rollout gating) |\n\nBaseline modes: `kill_gradient`, `behavior_penalty`, `visible_constraint`.\n\n## Install\n\n```bash\nprime env install backdoor-ifeval-phase-diagram -p ./environments --plain\n```\n\nAfter hosted validation and push, install from Hub with\n`prime env install austindixson/backdoor-ifeval-phase-law`.\n\nLocal dev (from repo):\n\n```bash\ncd environments/backdoor_ifeval_phase_diagram\npip install -e \".[dev]\"\n```\n\n## Example configs\n\n```bash\nprime train run --yes configs/phase/binary-control.toml\nprime train run --yes configs/phase/binary-group-balance.toml\n```\n\n## Tests\n\n```bash\ncd environments/backdoor_ifeval_phase_diagram\npython -m pytest tests/ -q\n```\n\n## Metrics caveat\n\nPrime may not expose true PPO advantages to the environment. `concentration_ratio` is a **reward-derived proxy** - label claims accordingly ([METRICS.md](../../METRICS.md)).\n","encoding":"utf-8","truncated":false,"total_bytes":1920},"status":null}