{"data":{"kind":"file","path":"README.md","version_id":"cnafj54wss4663yerb401rf9","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1775,"modified_at":"2025-10-22T03:15:47.545000","content_hash":"8988750ba5ae6d940516cc9be1d88e8052b5d67a2c048142660773e43a73d76d"},"entries":[],"content":"# w0-brpo\n\nThis environment is made to support the BRPO process introduced in [Writing-zero paper](https://www.alphaxiv.org/abs/2506.00103).\n\n1. The rollouts are put into group\n2. A reference answer at random is selected\n3. All the other answers are compared with the reference (pairwise): return 1 of reference is preferred, -1 if not and 0 for the actual reference.\n\n### Overview\n- **Environment ID**: `w0-brpo`\n- **Short description**: <one-sentence description>\n- **Tags**: GenRM, Pair-wise, GRPO, BRPO\n\n### Datasets\n- **Primary dataset(s)**: A syntheticly generated and filtered data\n- **Source links**: [HF Data](https://huggingface.co/datasets/dmnsh/W0_GenRM)\n\n### Task\n- **Type**: multi-turn\n- **Rubric overview**: Generates a list of 1 and -1 set of pair-wise rewards\n\n### Quickstart\n`vf-eval` does not run currently\n\n### Environment Arguments\nDocument any supported environment arguments and their meaning. Example:\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `judge_model` | str | `\"gpt-4.1-mini\"` | The model used for judging/pair-wise comparisons |\n| `judge_base_url` | str | `\"https://api.openai.com/v1\"` | The base URL for the judge API client |\n| `judge_api_key_var` | str | `\"OPENAI_API_KEY\"` | Environment variable name for the judge API key |\n| `dbs` | List[str] | `['LitBench']` | List of datasets to load (e.g., ['LitBench']) |\n| `group_size` | int | `2` | Size of the group to perform pair-wise comparison on. |\n| `curators` | Union[List[str], List[dict]] | `['glm_4.5_air', {'curator': 'nemotron_nano_9b_v2', 'range': (0, 4)}]` | Curators or splits for loading data, can be strings or dicts with range |\n\n\n### Metrics\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | list of reward of pair-wise comparisons |\n","encoding":"utf-8","truncated":false,"total_bytes":1775},"status":null}