{"data":{"kind":"file","path":"README.md","version_id":"zed6xkwkulexsm0ar31i4uw1","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2351,"modified_at":"2025-08-28T03:41:59.275000","content_hash":"f1899b6b94592b19081960823107f6145185404a4414500d8a346997cfef6fb4"},"entries":[],"content":"# ifeval-confusables\n\nThis is a proof of concept for adversarially augmenting an environment with unicode confusables. It takes the ifbench environment and applies character-level perturbations to the input text.\n\n### Overview\n\n- **Environment ID**: `ifeval-confusables`\n- **Short description**: Single-turn instruction following evaluation using RLVR-IFeval dataset with JSON constraint rewards and no reasoning required... except that the input data contains unicode confusables.\n- **Tags**: ifeval, single-turn, chat, constraints, none-reasoning, train, eval, adversarial-robustness\n\n### Dataset\n\n- **Source**: `allenai/RLVR-IFeval`\n- **Splits**: Uses `validation` or `test` if present; otherwise creates a train/eval holdout from `train` via `train_test_split` (seed=42).\n- **Fields used**:\n  - `messages`: list of chat messages containing the question\n  - `ground_truth`: JSON string with constraint and args\n\n### Prompting & Schema\n\n- **System message**: \"Answer the following question. /no_think\"\n- **User message**: Contains the question from the first user message in the original messages, with unicode confusables applied.\n- **Example schema per example**:\n  - `prompt`: list of messages `[{\"role\":\"system\",...}, {\"role\":\"user\",...}]`\n  - `answer`: JSON string with constraint and args (ground truth)\n\n### Parser & Rewards\n\n- **Parser**: None (uses IFEvalRubric directly)\n- **Rewards**: Uses IFEvalRubric for evaluation with JSON constraint checking\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `num_train_examples` | int | `-1` | Limit training set size (`-1` for all) |\n| `num_eval_examples` | int | `-1` | Limit eval set size (`-1` for all) |\n\n### Quickstart\n\nEvaluate with defaults (uses the env's internal dataset handling):\n\n```bash\nuv run vf-eval ifeval-confusables \\\n  -a '{\"num_train_examples\":-1, \"num_eval_examples\":-1}'\n```\n\nNotes:\n\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n- Reports (if produced) will be placed under `./environments/ifeval-confusables/reports/`.\n\n## Evaluation Reports\n\n<!-- Do not edit below this line. Content is auto-generated. -->\n<!-- vf:begin:reports -->\n<p>No reports found. Run <code>uv run vf-eval ifeval-confusables -a '{\"key\": \"value\"}'</code> to generate one.</p>\n<!-- vf:end:reports -->\n","encoding":"utf-8","truncated":false,"total_bytes":2351},"status":null}