{"data":{"kind":"file","path":"README.md","version_id":"h9a5xxqsysh54gw7floaq3ip","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1835,"modified_at":"2026-06-01T19:55:34.990000","content_hash":"4cddd0014fa47407807382496bbb17b9a812da92a0c217532ae23aeaf3e7693d"},"entries":[],"content":"# ifbench\n\n<a href=\"https://github.com/PrimeIntellect-ai/research-environments/tree/main/environments/ifbench\">\n<img src=\"https://img.shields.io/badge/GitHub-181717?style=for-the-badge&logo=github&logoColor=white\" alt=\"Source Code\">\n</a>\n\n### Overview\n- **Environment ID**: `ifbench`\n- **Short description**: IFBench evaluation environment\n- **Tags**: single-turn, if, eval\n\n### Datasets\n- **Primary dataset(s)**: `allenai/IFBench_test`\n- **Source links**: [HF](https://huggingface.co/datasets/allenai/IFBench_test), [GitHub](https://github.com/allenai/IFBench)\n- **Split sizes**: 300 samples\n\n### Task\n- **Type**: single-turn, if, eval\n- **Parser**: `MaybeThinkParser`\n- **Rubric overview**: `followed_instructions_rate` (ratio of instructions that have been followed), `num_instructions` (number of instructions to follow), `followed_instructions` (whether all instructions have been followed)\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nprime eval run ifbench\n```\n\n### Environment Arguments\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `dataset_name` | str | `allenai/IFBench_test` | The name of the HF dataset to use |\n| `dataset_subset` | str | `default` | The subset of the HF dataset to use |\n| `dataset_split` | str | `train` | The split of the HF dataset to use |\n| `mode` | str | `\"loose\"` | The mode of the evaluation. Set to `\"loose\"` for loose evaluation, else set to `\"strict\"` |\n| `system_prompt` | str or `None` | `None` | System prompt shown to the model |\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `followed_instructions_rate` | Ratio of instructions that have been followed (weight: 0) |\n| `num_instructions` | Number of instructions to follow (weight: 0) |\n| `followed_instructions` | Whether all instructions have been followed (weight: 1) |","encoding":"utf-8","truncated":false,"total_bytes":1835},"status":null}