{"data":{"kind":"file","path":"README.md","version_id":"z55jkujyvxc8p5otnpj85rtx","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1473,"modified_at":"2026-06-05T19:13:25.615000","content_hash":"0056a6e98e281dffd201350a81239f75263d18ba791f38bc0ef635c74217e43d"},"entries":[],"content":"# hosted-online-eval-shape-probe\n\n### Overview\n- **Environment ID**: `hosted-online-eval-shape-probe`\n- **Short description**: Minimal legacy `MultiTurnEnv` probe for hosted online eval reward completion shape.\n- **Tags**: diagnostic, train, eval, multi-turn\n\n### Datasets\n- **Primary dataset(s)**: One inline row.\n- **Source links**: None.\n- **Split sizes**: 1 train row; eval falls back to train like legacy environments without `eval_dataset`.\n\n### Task\n- **Type**: legacy multi-turn\n- **Output format expectations**: Reply with `<answer>OK</answer>`.\n- **Rubric overview**: Logs the completion shape, reward kwargs, state keys, and sampling/generation fields passed to reward scoring, then returns a sentinel reward.\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nprime eval run hosted-online-eval-shape-probe\n```\n\nConfigure model and sampling:\n\n```bash\nprime eval run hosted-online-eval-shape-probe   -m openai/gpt-4.1-mini   -n 20 -r 3 -t 1024 -T 0.7\n```\n\nNotes:\n- This is diagnostic-only. It is not an ablation environment and should not be used for training claims.\n- Hosted GRPO jobs may still fail after the diagnostic eval if all train-rollout rewards have zero variance; the eval env-server logs are the artifact this probe is meant to produce.\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `shape_reward` | `0.789` for rendered non-null completion, `0.456` for list messages with null content, `0.123` for `completion is None` |\n","encoding":"utf-8","truncated":false,"total_bytes":1473},"status":null}