{"data":{"kind":"file","path":"README.md","version_id":"z8xnnm7qdqo8t7jeabfidqki","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1742,"modified_at":"2025-10-23T07:27:52.641000","content_hash":"37ef405133ff35102ff871536ed85e8fde93e969966cfaa2c08baa61896ea1ae"},"entries":[],"content":"# reasoning-gym-env\n\n### Overview\n- **Environment ID**: `reasoning-gym-env`\n- **Short description**: Single-turn evaluation over `reasoning_gym` procedural tasks with XML formatting.\n- **Tags**: reasoning, procedural, single-turn, xml, synthetic\n\n### Datasets\n- **Primary dataset(s)**: Generated via `reasoning_gym` (e.g., `arc_1d`, or composite configs)\n- **Source links**: `reasoning_gym` library\n- **Split sizes**: Configurable counts for train/eval via loader args\n\n### Task\n- **Type**: single-turn\n- **Parser**: `XMLParser([\"answer\"])`\n- **Rubric overview**: Score computed via `reasoning_gym` task-specific scorer; optional format component\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval reasoning-gym-env\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval reasoning-gym-env \\\n  -m gpt-4.1-mini \\\n  -n 20 -r 3 -t 1024 -T 0.7 \\\n  -a '{\"gym\": \"arc_1d\", \"num_train_examples\": 2000, \"num_eval_examples\": 2000}'\n```\n\nNotes:\n- Use `gym` to select a single dataset name, a list of names, or a composite specification.\n- Reports are written under `./environments/reasoning_gym_env/reports/` and auto-embedded below.\n\n### Environment Arguments\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `gym` | str | `\"arc_1d\"` | Single task name, list of names, or composite config |\n| `num_train_examples` | int | `2000` | Number of training examples |\n| `num_eval_examples` | int | `2000` | Number of evaluation examples |\n| `seed` | int | `0` | Random seed for dataset generation |\n\n### Metrics\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Task-specific score from `reasoning_gym` for parsed answer |\n| `format_reward` | Adherence to `<think>`/`<answer>` XML format |\n","encoding":"utf-8","truncated":false,"total_bytes":1742},"status":null}