{"data":{"kind":"file","path":"README.md","version_id":"vn3biybuelm55chbjh6dyfid","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2735,"modified_at":"2025-09-21T15:28:50.904000","content_hash":"051717623edbe07072b686864a4c81d7135cc18829a279ef8aa18c86e90f3137"},"entries":[],"content":"# ifeval\n\n### Overview\n- **Environment ID**: `ifeval`\n- **Short description**: Single-turn instruction following evaluation using RLVR-IFeval with JSON constraint rewards. Adds explicit <think>...</think> reasoning section before the final answer.\n- **Tags**: ifeval, single-turn, chat, constraints, none-reasoning, train, eval\n\n### Dataset\n- **Source**: `allenai/RLVR-IFeval`\n- **Splits**: Uses `validation` or `test` if present; otherwise creates a train/eval holdout from `train` via `train_test_split` (seed=42).\n- **Fields used**:\n  - `messages`: list of chat messages containing the question\n  - `ground_truth`: JSON string with constraint and args\n\n### Prompting & Schema\n- **System message**: \"Follow the user's instructions exactly. First, think step-by-step inside <think>...</think>. Then, after </think>, provide ONLY your final response that satisfies the user's constraints.\"\n- **User message**: Contains the question from the first user message in the original messages\n- **Example schema per example**:\n  - `prompt`: list of messages `[{\"role\":\"system\",...}, {\"role\":\"user\",...}]`\n  - `answer`: JSON string with constraint and args (ground truth)\n\n### Parser & Rewards\n- **Parser**:\n  - `reasoning=false` (default): None — evaluates the whole assistant message as the answer.\n  - `reasoning=true`: `SingleThinkXMLParser` — evaluates only the section after `</think>` as the final answer.\n- **Rewards**:\n  - IFEvalRubric: JSON-constraint check on the “final answer” text (whole message if `reasoning=false`, or after `</think>` if `reasoning=true`).\n  - When `reasoning=true`: adds shaping (+0.2) for a well-formed single <think>…</think> block with non-empty final answer; otherwise -0.2.\n\n### Environment Arguments\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `num_train_examples` | int | `-1` | Limit training set size (`-1` for all) |\n| `num_eval_examples` | int | `-1` | Limit eval set size (`-1` for all) |\n| `reasoning` | bool | `false` | If true, require `<think>…</think>` and score only the post-`</think>` final answer (adds small shaping reward) |\n\n### Quickstart\n\nEvaluate with defaults (uses the env's internal dataset handling):\n\n```bash\nuv run vf-eval ifeval \\\n  -a '{\"num_train_examples\":-1, \"num_eval_examples\":-1, \"reasoning\": true}'\n```\n\nNotes:\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n- Reports (if produced) will be placed under `./environments/ifeval/reports/`.\n\n## Evaluation Reports\n\n<!-- Do not edit below this line. Content is auto-generated. -->\n<!-- vf:begin:reports -->\n<p>No reports found. Run <code>uv run vf-eval ifeval -a '{\"key\": \"value\"}'</code> to generate one.</p>\n<!-- vf:end:reports -->\n","encoding":"utf-8","truncated":false,"total_bytes":2735},"status":null}