{"data":{"kind":"file","path":"README.md","version_id":"lrv1q66gybsqoex79n4y041w","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1359,"modified_at":"2026-04-17T06:43:53.773000","content_hash":"e62fb3b03e00f6dc04e5b855440153958e353593a75ce042cb4b12d9c8630247"},"entries":[],"content":"# tau2_infinity_wg\n\nPrime Intellect RL environment — airline customer-service gym backed by world-gen failure artifacts.\n\nAn agent plays the airline-support agent role against an LLM-simulated customer across a stateful tool environment. Each episode runs against a `failure-*.json` artifact that pins the initial world state and a set of objectives the agent must satisfy.\n\n## Rewards\n\nFour components (ToolRL decomposition):\n\n- **outcome** — LLM-as-judge (ORM) scores each objective on a continuous 5-bucket rubric; final score is mean across objectives. See `verifier.py` for the judge determinism treatment.\n- **format** — strict tool-call schema compliance, binary.\n- **efficiency** — tool-call count vs. optimal, decaying.\n- **compliance** — penalties for parallel calls and unauthorized mutations.\n\nSee `rewards.py` for weights and the \"Things explicitly deferred\" section.\n\n## Artifacts\n\nThe `artifacts/` directory holds world-gen outputs — each file is a task the agent must solve. Artifacts are loaded verbatim by `dataset.py`; the env never regenerates them at runtime.\n\n## Models\n\nDefault judge + customer-simulation model: `bedrock/us.anthropic.claude-haiku-4-5-20251001-v1:0`. Override via `load_environment(judge_model=..., customer_model=...)`.\n\nRequired secrets (or env vars) at runtime: `AWS_BEARER_TOKEN_BEDROCK`, `AWS_REGION`.\n","encoding":"utf-8","truncated":false,"total_bytes":1359},"status":null}