{"data":{"kind":"file","path":"README.md","version_id":"q2sgdb2ztcilcyjt8uvzfy3r","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2043,"modified_at":"2026-05-27T23:29:59.216000","content_hash":"bd734737e0301afaf3d8cf23bc30f3e89ea874449aff956be3620a35f5e7ffd6"},"entries":[],"content":"# apex-shortlist\n\n### Overview\n- **Environment ID**: `apex-shortlist`\n- **Short description**: MathArena Apex Shortlist problems evaluated with a v1 single-turn taskset and boxed final-answer reward.\n\n### Datasets\n- **Primary dataset(s)**: [MathArena/apex-shortlist](https://huggingface.co/datasets/MathArena/apex-shortlist), licensed under CC BY-NC-SA 4.0\n- **Split sizes**: Defaults to split `train` (N=48)\n\n### Task\n- **Type**: v1 `Taskset` + base `Harness`\n- **System prompt**: `Put your final answer within \\boxed{}.` from the MathArena Apex Shortlist config.\n- **Parser**: `MaybeThinkParser` wrapping `extract_boxed_answer`\n- **Reward overview**: Symbolic equivalence on the parsed boxed answer using `math-verify`.\n- **Reference config**: [MathArena APEX Shortlist 2025](https://github.com/eth-sri/matharena/blob/main/configs/competitions/apex/shortlist_2025.yaml)\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nprime eval run apex-shortlist\n```\n\nConfigure model and sampling:\n\n```bash\nprime eval run apex-shortlist \\\n  -m gpt-5.5 \\\n```\n\nNotes:\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n\n### Environment Arguments\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `system_prompt` | str or None | `\"Put your final answer within \\\\boxed{}.\"` | System prompt shown to the model |\n| `max_turns` | int or None | `None` | Accepted for compatibility; rollouts stay single-turn |\n| `config` | `vf.EnvConfig` or dict or None | `None` | v1 environment config |\n\n### Metrics\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | 1.0 if parsed boxed answer is symbolically equivalent to target, else 0.0 |\n\n### Changelog\n\n#### v0.1.1\n- Load via the current Verifiers V1 taskset/config shape.\n- Remove `dataset_name` and `dataset_split` loader/config options; the MathArena Apex Shortlist dataset and `train` split are fixed.\n- Accept `max_turns` only for caller compatibility and force rollouts to stay single-turn.\n- Require `verifiers>=0.1.15.dev11`.\n","encoding":"utf-8","truncated":false,"total_bytes":2043},"status":null}