{"data":{"kind":"file","path":"README.md","version_id":"ahao5ygbeadduzw7ssdi0a1c","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1066,"modified_at":"2025-09-10T10:26:09.886000","content_hash":"76a2296fa6efa95e7ef9e13fd4348104ac726d5eac8e852e599ba75294539a74"},"entries":[],"content":"# group-metrics\n\nWrapper environment that adds group-level statistics to any base environment's rubric.\n\n## Usage\n\n```toml\n[environment]\nid = \"group-metrics\"\n\n[environment.args]\nbase_env = \"alphabet-sort\"  # Any existing environment\n# ... pass through any base environment args\n```\n\n## What it does\n\nWraps any existing environment to add 4 additional metrics with zero weight:\n- `group_mean` - Average score within prompt group\n- `group_max` - Best score within prompt group  \n- `group_min` - Worst score within prompt group\n- `group_size` - Number of rollouts in prompt group\n\nRollouts are grouped by prompt content hash. Useful for analyzing performance variation.\n\n## Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `base_env` | str | Required | Base environment to wrap |\n\nAll other arguments should be passed through to the base environment.\n\n## Example\n\n```bash\nuv run vf-eval group-metrics -a '{\"base_env\": \"alphabet-sort\", \"max_turns\": 2}'\n```\n\nWorks with any base environment that has reward functions.","encoding":"utf-8","truncated":false,"total_bytes":1066},"status":null}